Name		Name	Last commit message	Last commit date
Latest commit History 30 Commits
ckpts/Pendulum-v0		ckpts/Pendulum-v0
test_results		test_results
utils		utils
video		video
LICENSE		LICENSE
README.md		README.md
agent.py		agent.py
learner.py		learner.py
params.py		params.py
play.py		play.py
test.py		test.py
test_every_new_ckpt.py		test_every_new_ckpt.py
train.py		train.py

Repository files navigation

Distributed Distributional Deep Deterministic Policy Gradients (D4PG)

A Tensorflow implementation of a Distributed Distributional Deep Deterministic Policy Gradients (D4PG) network, for continuous control.

D4PG builds on the Deep Deterministic Policy Gradients (DDPG) approach (paper, code), making several improvements including the introduction of a distributional critic, using distributed agents running on multiple threads to collect experiences, prioritised experience replay (PER) and N-step returns.

Trained on OpenAI Gym environments.

This implementation has been successfully trained and tested on the Pendulum-v0, BipedalWalker-v2 and LunarLanderContinuous-v2 environments. This code can however be run on any environment with a low-dimensional (non-image) state space and continuous action space.

This currently holds the high score for the Pendulum-v0 environment on the OpenAI leaderboard

Requirements

Note: Versions stated are the versions I used, however this will still likely work with other versions.

Ubuntu 16.04 (Most (non-Atari) envs will also work on Windows)
python 3.5
OpenAI Gym 0.10.8 (See link for installation instructions + dependencies)
tensorflow-gpu 1.5.0
numpy 1.15.2
scipy 1.1.0
opencv-python 3.4.0
imageio 2.4.1 (requires pillow)
inotify-tools 3.14

Usage

The default environment is 'Pendulum-v0'. To use a different environment simply change the ENV parameter in params.py before running the following files.

To train the D4PG network, run

  $ python train.py

This will train the network on the specified environment and periodically save checkpoints to the /ckpts folder.

To test the saved checkpoints during training, run

  $ python test_every_new_ckpt.py

This should be run alongside the training script, allowing to periodically test the latest checkpoints as the network trains. This script will invoke the run_every_new_ckpt.sh shell script which monitors the given checkpoint directory and runs the test.py script on the latest checkpoint every time a new checkpoint is saved. Test results are saved to a text file in the /test_results folder (optional).

Once we have a trained network, we can visualise its performance in the environment by running

  $ python play.py

This will play the environment on screen using the trained network and save a GIF (optional).

Note: To reproduce the best 100-episode performance of -123.11 +/- 6.86 that achieved the top score on the 'Pendulum-v0' OpenAI leaderboard, run

  $ python test.py

specifying the test_params.ckpt_file parameter in params.py as Pendulum-v0.ckpt-660000.

Results

Result of training the D4PG on the 'Pendulum-v0' environment:

Result of training the D4PG on the 'BipedalWalker-v2' environment:

To-Do

Result of training the D4PG on the 'LunarLanderContinuous-v2' environment:

To-Do

Environment	Best 100-episode performance	Ckpt file
Pendulum-v0	-123.11 +- 6.86	ckpt-660000

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed Distributional Deep Deterministic Policy Gradients (D4PG)

Requirements

Usage

Results

To-do

References

License

About

Releases

Packages

Languages

License

little-nem/D4PG

Folders and files

Latest commit

History

Repository files navigation

Distributed Distributional Deep Deterministic Policy Gradients (D4PG)

Requirements

Usage

Results

To-do

References

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages