Dynamic Attention Model for Vehicle Routing Problems

This repository contains implementation of "A Deep Reinforcement Learning Algorithm Using Dynamic Attention Model for Vehicle Routing Problems" article (TensorFlow2).

Dmitry Eremeev, Alexey Pustynnikov

This work was done as a final project for DeepPavlov course: Advanced Topics in Deep Reinforcement learning . A non-dynamic version (Attention, Learn to Solve Routing Problems! ) of this approach, which was implemented as a part of the project, can be found at https://github.com/alexeypustynnikov/AM-VRP.

One of the important applications of combinatorial optimization is vehicle routing problem, in which the goal is to find the best routes for a fleet of vehicles visiting a set of locations. Usually, "best" means routes with the least total distance or cost.

We would consider only particular case of general VRP problem: Capacitated Vehicle Routing Problem (CVRP), where the vehicle has a limited carrying capacity of the goods that must be delivered.

VRP is an NP-hard problem (Lenstra and Rinnooy Kan, 1981).

Exact algorithms are only efficient for small problem instances. The number of near-optimal algorithms are introduced in academic literature. There are multiple professional tools for solving various VRP problems (ex. Google OR-Tools ).

Attention Model Aproach

The structural features of the input graph instance are extracted by the encoder. Then the solution is constructed incrementally by the decoder.

Specifically, at each construction step, the decoder predicts a distribution over nodes, then one node is selected and appended to the end of the partial solution.

Main ideas:

Use RL to create agent that can learn heuristics and provide suboptimal solutions.
Make use of Graph Attention Networks (GAT) to create appropriate graph embeddings for the agent.
Policy of RL agent is governed by decoder.

Architecture:

Dynamic Attention Model (AM-D) Approach:

After vehicle returns to depot, the remaining nodes could be considered as a new (smaller) instance (graph) to be solved.
Idea: update embedding of the remaining nodes using encoder after agent arrives back to depot.
Implementation:

Force RL agent to wait for others once it arrives to $x_0$.
When every agent is in depot, apply encoder with mask to the whole batch.

Enviroment:

Current enviroment implementation is located in enviroment.py file - AgentVRP class .

The class contains information about current state and actions that were done by agent.

Main methods:

step(action): transit to a new state according to the action.
get_costs(dataset, pi): returns costs for each graph in batch according to the paths in action-state space.
get_mask(): returns a mask with available actions (allowed nodes).
all_finished(): checks if all games in batch are finished (all graphes are solved).
partial_finished(): checks if partial solutions for all graphs has been built, i.e. all agents came back to depot.

Connection with RL language:

State: $X$ - graph instance (coordinates, demands, etc.) together with information in which node agent is located.
Action: $\pi_t$ - decision in which node agent should go.
Reward: The (negative) tour length.

Model Training:

AM-D is trained by policy gradient using REINFORCE algorithm with baseline.

Baseline

Baseline is a copy of model with fixed weights from one of the preceding epochs.
Use warm-up for early epochs: mix exponential moving average of model cost over past epochs with baseline model.
Update baseline at the end of epoch if the difference in costs for candidate model and baseline is statistically-significant (t-test).
Baseline uses separate dataset for this validation. This dataset is updated after each baseline renewal.

Example

Files Description:

Implementation in TensorFlow 2

AM-D for VRP Report.ipynb - demo report notebook
enviroment.py - enviroment for VRP RL Agent
layers.py - MHA layers for encoder
attention_graph_encoder.py - Graph Attention Encoder
reinforce_baseline.py - class for REINFORCE baseline
attention_dynamic_model.py - main model and decoder
train.py - defines training loop which we use in train_model.ipynb
train_model.ipynb - from this file one can start training or continue training from chechpoint
utils.py and utils_demo.py - various auxiliary functions for data creation, saving and visualisation
lkh3_baseline folder - everything for running LKH algorithm + logs.
results folder: folder name is ADM_VRP_{graph_size}_{batch_size}. There are training logs, learning curves and saved models in each folder

Training procedure:

Open train_model.ipynb and choose training parameters.
All outputs would be saved in current directory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dynamic Attention Model for Vehicle Routing Problems

This repository contains implementation of "A Deep Reinforcement Learning Algorithm Using Dynamic Attention Model for Vehicle Routing Problems" article (TensorFlow2).

Dmitry Eremeev, Alexey Pustynnikov

Attention Model Aproach

Main ideas:

Architecture:

Dynamic Attention Model (AM-D) Approach:

Enviroment:

Model Training:

Files Description:

Training procedure:

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
Slides		Slides
lkh3_baseline		lkh3_baseline
pictures		pictures
results		results
AM-D for VRP Report.ipynb		AM-D for VRP Report.ipynb
README.md		README.md
attention_dynamic_model.py		attention_dynamic_model.py
attention_graph_encoder.py		attention_graph_encoder.py
enviroment.py		enviroment.py
layers.py		layers.py
reinforce_baseline.py		reinforce_baseline.py
requirements.txt		requirements.txt
train.py		train.py
train_model.ipynb		train_model.ipynb
utils.py		utils.py
utils_demo.py		utils_demo.py

nusmadrl/ADM-VRP

Folders and files

Latest commit

History

Repository files navigation

Dynamic Attention Model for Vehicle Routing Problems

This repository contains implementation of "A Deep Reinforcement Learning Algorithm Using Dynamic Attention Model for Vehicle Routing Problems" article (TensorFlow2).

Dmitry Eremeev, Alexey Pustynnikov

Attention Model Aproach

Main ideas:

Architecture:

Dynamic Attention Model (AM-D) Approach:

Enviroment:

Model Training:

Files Description:

Training procedure:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages