Skip to content

Reproducing Policy Distillation (DeepMind paper ICLR 2016)

Notifications You must be signed in to change notification settings

ciwang/policydistillation

Repository files navigation

cs234-policydist

NOTE: THIS CODE IS NOT MAINTAINED. PLEASE DO NOT USE IT.

Reproducing the algorithm described in Rusu et al., 2016.

"Quick" start:

  • Run python natureqn_atari.py to train the teacher netowork. (This will take ~12 hours.) Skip this step if trained Tensorflow DQN for Pong is saved as a checkpoint.
  • Run python distilledqn_atari.py to train the student network. Make sure the loss function and checkpoint directory are correct.

About

Reproducing Policy Distillation (DeepMind paper ICLR 2016)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages