TD3-TensorFlow

TD3 for OpenAI gym environments using Tensorflow

To address the function approximation error of Actor-Critic methods, Fujimoto et al., proposed the Twin Delayed Deep Deterministic policy gradient method (TD3).

Using a total of six neural networks, TD3 minimises the approximated Q-value by taking the minimum value from two Critic neural networks and uses this value to optimise the Actor network.

The original implementation is written in Pytorch and is the basis for the Tensorflow version written here.

Requirements

Gym: pip install gym

Pybullet: pip install pybullet_envs

The below line in main.py has been commented out. To enable recordings of certain episodes simply uncomment line 19.

env = wrappers.Monitor(env, save_dir, force = True)

Note that one Windows machines ffmpeg will need to be saved in the same local directory as train.py.

Results

After 1 million timesteps the model will achieve an average total reward of 2200 for "AntBulletEnv-v0".

The Actor loss has a local minimum around the -60 value where the agent will get stuck in a single position. Using Huber loss as opposed to MSE was instrumental in the agent progressing and achieving higher total rewards.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
saved_models		saved_models
README.md		README.md
model.py		model.py
replaybuffer.py		replaybuffer.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TD3-TensorFlow

TD3 for OpenAI gym environments using Tensorflow

Requirements

Results

About

Releases

Packages

Languages

jrcleeman/TD3-TensorFlow

Folders and files

Latest commit

History

Repository files navigation

TD3-TensorFlow

TD3 for OpenAI gym environments using Tensorflow

Requirements

Results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages