cs234-policydist

NOTE: THIS CODE IS NOT MAINTAINED. PLEASE DO NOT USE IT.

Reproducing the algorithm described in Rusu et al., 2016.

"Quick" start:

Run python natureqn_atari.py to train the teacher netowork. (This will take ~12 hours.) Skip this step if trained Tensorflow DQN for Pong is saved as a checkpoint.
Run python distilledqn_atari.py to train the student network. Make sure the loss function and checkpoint directory are correct.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
core		core
utils		utils
.gitignore		.gitignore
README.md		README.md
config.py		config.py
distilledqn.py		distilledqn.py
distilledqn_atari.py		distilledqn_atari.py
linear.py		linear.py
natureqn.py		natureqn.py
natureqn_atari.py		natureqn_atari.py
schedule.py		schedule.py