Skip to content

Latest commit

 

History

History

HopperBulletEnv_v0-TD3

Project - HopperBulletEnv with Twin Delayed DDPG (TD3)

Environment

Solving the environment require an average total reward of over 2500 on 100 consecutive episodes.
Training of HopperBulletEnv is performed using the Twin Delayed DDPG (TD3) algorithm, see
the basic paper Addressing Function Approximation Error in Actor-Critic Methods.
In this directory we solve the HopperBulletEnv environment in 3240 episodes with the parameter noise std = 0.03, and in 5438 episodes with noise std = 0.02.

Training the Agent

The score 2500 was achieved in the episode 3240 after training 25 hours 28 minutes.

The score 2500 was achieved in the episode 5438 after training 36 hours 59 minutes.

Relevant paper

Three aspects of Deep RL: noise, overestimation and exploration

Other TD3 projects

Video

See video Lucky Hopper on youtube.

Credit

The source paper is Addressing Function Approximation Error in Actor-Critic Methods
by Scott Fujimoto , Herke van Hoof, David Meger.