Tensorflow 2.0 implementation of Attention, Learn to Solve Routing Problems! article.
This work was done as part of a final project for DeepPavlov course: Advanced Topics in Deep Reinforcement learning.
Code of the full project (dynamic version) is located at https://github.com/d-eremeev/ADM-VRP
Current enviroment implementation is located in Enviroment.py file - AgentVRP class.
The class contains information about current state and actions that were done by agent.
Main methods:
- step(action): transit to a new state according to the action.
- get_costs(dataset, pi): returns costs for each graph in batch according to the paths in action-state space.
- get_mask(): returns a mask with available actions (allowed nodes).
- all_finished(): checks if all games in batch are finished (all graphes are solved).
Let's connect current terms with RL language (small dictionary):
-
State:
$X$ - graph instance (coordinates, demands, etc.) together with information in which node agent is located. -
Action:
$\pi_t$ - decision in which node agent should go. - Reward: The (negative) tour length.
AM is trained by policy gradient using REINFORCE algorithm with baseline.
Baseline
- Baseline is a copy of model with fixed weights from one of the preceding epochs.
- Use warm-up for early epochs: mix exponential moving average of model cost over past epochs with baseline model.
- Update baseline at the end of epoch if the difference in costs for candidate model and baseline is statistically-significant (t-test).
- Baseline uses separate dataset for this validation. This dataset is updated after each baseline renewal.
- Enviroment.py - enviroment for VRP RL Agent
- layers.py - MHA layers for encoder
- attention_graph_encoder.py - Graph Attention Encoder
- attention_graph_decoder.py - Graph Attention Decoder
- attention_model.py - Attention Model
- reinforce_baseline.py - class for REINFORCE baseline
- train.py - defines training loop, that we use in train_with_checkpoint.ipynb
- train_with_checkpoint.ipynb - from this file one can start training or continue training from chechpoint
- generate_data.py - various auxiliary functions for data creation, saving and visualisation
- results folder: folder name is ADM_VRP_{graph_size}_{batch_size}. There are training logs, learning curves and saved models in each folder
- Open train_with_checkpoint.ipynb and choose training parameters.
- All outputs would be saved in current directory.