Skip to content

jrcleeman/TD3-TensorFlow

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TD3-TensorFlow

TD3 for OpenAI gym environments using Tensorflow

To address the function approximation error of Actor-Critic methods, Fujimoto et al., proposed the Twin Delayed Deep Deterministic policy gradient method (TD3).

Using a total of six neural networks, TD3 minimises the approximated Q-value by taking the minimum value from two Critic neural networks and uses this value to optimise the Actor network.

The original implementation is written in Pytorch and is the basis for the Tensorflow version written here.

Requirements

Gym:       pip install gym

Pybullet: pip install pybullet_envs

The below line in main.py has been commented out. To enable recordings of certain episodes simply uncomment line 19.

env = wrappers.Monitor(env, save_dir, force = True)

Note that one Windows machines ffmpeg will need to be saved in the same local directory as train.py.

Results

After 1 million timesteps the model will achieve an average total reward of 2200 for "AntBulletEnv-v0".

Alt Text

The Actor loss has a local minimum around the -60 value where the agent will get stuck in a single position. Using Huber loss as opposed to MSE was instrumental in the agent progressing and achieving higher total rewards.

Imgur

About

A TensorFlow implementation of TD3

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%