Skip to content

Latest commit

 

History

History

HalfCheetahBulletEnv-TD3

Project - HalfChhetahBulletEnv with Twin Delayed DDPG (TD3)

Environment

Solving the environment require an average total reward of over 3000 over 100 consecutive episodes.
The environment is solved in 1588 episodes in 21 hour 18 min by usage of the Twin Delayed DDPG (TD3) algorithm,
see the basic paper Addressing Function Approximation Error in Actor-Critic Methods.

For Three TD3 tricks, see Walker2DBulletEnv-v0_TD3 Readme.

Exploration noise

Exploration noise is the crucial parameterin in TD3. For this project, the parameter std_noise is choosed 0.05.
For details, see Three aspects of Deep RL: noise, overestimation and exploration.

Training Score

Other TD3 projects

Videos

See videos Such a fast cheetah and
Chessboard chase with four Pybullet actors on youtube.

Credit

The source paper is Addressing Function Approximation Error in Actor-Critic Methods
by Scott Fujimoto , Herke van Hoof, David Meger.