The paper considers a version of the vehicle routing problem where customers’ demands are uncertain. The focus is on dynamically routing a single vehicle to serve the demands of a known set of geographically dispersed customers during real-time operations. The goal consists of minimizing the expected distance traveled in order to serve all customers’ demands. Since actual demand is revealed upon arrival of the vehicle at the location of each customer, fully exploiting this feature requires a dynamic approach.
This work studies the suitability of the emerging field of neuro-dynamic programming (NDP) in providing approximate solutions to this difficult stochastic combinatorial optimization problem. The paper compares the performance of two NDP algorithms: optimistic approximate policy iteration and a rollout policy. While the former improves the performance of a nearest-neighbor policy by 2.3%, the computational results indicate that the rollout policy generates higher quality solutions. The implication for the practitioner is that the rollout policy is a promising candidate for vehicle routing applications where a dynamic approach is required.