Reproducing the algorithm described in Rusu et al., 2016.
"Quick" start:
- Run
python natureqn_atari.py
to train the teacher netowork. (This will take ~12 hours.) Skip this step if trained Tensorflow DQN for Pong is saved as a checkpoint. - Run
python distilledqn_atari.py
to train the student network. Make sure the loss function and checkpoint directory are correct.