Skip to content
/ ddpg Public

PyTorch implementation of DDPG with Hindsight Experience Replay (HER)

Notifications You must be signed in to change notification settings

ylkuo/ddpg

Repository files navigation

DDPG with Hindsight Experience Replay (HER)

This is the PyTorch implementation of DDPG with Hindsight Experience Replay (HER). The code is adapted from @julianalverio's reimplementation. For more details, please check the original paper.

Getting Started

You can take the following steps to run DDPG (with HER) on the OpenAI gym Fetch environments.

Setting up a conda environment

conda create tensorflow mpi4py python=3.6 -n ddpg
conda activate ddpg

Installing the required packages

LD_LIBRARY_PATH=$HOME/.mujoco/mujoco200/bin:$LD_LIBRARY_PATH pip install mujoco_py==2.0.2.4 torch tensorboard==1.14.0 opencv-python scipy GPUtil cloudpickle requests future gym[all]

Training the agent

After setting up the packages, you can run run.py as follows to train the agent.

LD_LIBRARY_PATH=$HOME/.mujoco/mujoco200/bin:$LD_LIBRARY_PATH python run.py --num_timesteps 10000000 --num_workers 25 --env FetchPickAndPlace-v1 --replay_strategy future

There are some parameters you can set.

  • env: The OpenAI gym Fetch environment to use.
  • replay_strategy: The replay strategy to use. You need to set it to future to use HER, and none to use the original DDPG strategy.
  • num_workers: The number of workers to run the rollout.

Results

After training, you will get learning curves similar to the figure below. With HER, the agent in FetchPickAndPlace-v1 can reach score 0.9 in around 400 epochs. Without HER, i.e. the original DDPG, the agent's score stays at around 0.03 and 0.04, which means the agent doesn't learn anything.


Training results for Fetch Pick and Place task.
(red) DDPG with HER, (blue) original DDPG

About

PyTorch implementation of DDPG with Hindsight Experience Replay (HER)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages