Skip to content
/ DDPG Public

Tensorflow implementation of a Deep Deterministic Policy Gradient (DDPG) network, trained on OpenAI Gym environments.

License

Notifications You must be signed in to change notification settings

msinto93/DDPG

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Deep Deterministic Policy Gradients (DDPG)

A Tensorflow implementation of a Deep Deterministic Policy Gradient (DDPG) network for continuous control.

Trained on OpenAI Gym environments.

This implementation has been successfully trained and tested on the Pendulum-v0 and BipedalWalker-v2 environments. This code can however be run 'out of the box' on any environment with a low-dimensional state space and continuous action space.

This currently holds the high score for the Pendulum-v0 environment on the OpenAI leaderboard

Requirements

Note: Versions stated are the versions I used, however this will still likely work with other versions.

Usage

The default environment is 'Pendulum-v0'. To use a different environment simply pass the environment in via the --env argument when running the following files.

  $ python train.py

This will train the DDPG on the specified environment and periodically save checkpoints to the /ckpts folder.

  $ ./run_every_new_ckpt.sh

This shell script should be run alongside the training script, allowing to periodically test the latest network as it trains. This script will monitor the /ckpts folder and run the test.py script on the latest checkpoint every time a new checkpoint is saved.

  $ python play.py

Once we have a trained network, we can visualise its performance in the environment by running play.py. This will play the environment on screen using the trained network and save a GIF (optional).

Note: To reproduce the best 100-episode performance of -123.79 +/- 6.90 that achieved the top score on the 'Pendulum-v0' OpenAI leaderboard, run:

  $ python test.py --ckpt_file 'Pendulum-v0.ckpt-26800'

Results

Result of training the DDPG on the 'Pendulum-v0' environment:

Result of training the DDPG on the 'BipedalWalker-v2' environment:

To-Do

Environment Best 100-episode performance Ckpt file
Pendulum-v0 -123.79 +- 6.90 ckpt-26800

To-do

  • Train/test on further environments, including Mujoco

References

License

MIT License

About

Tensorflow implementation of a Deep Deterministic Policy Gradient (DDPG) network, trained on OpenAI Gym environments.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published