Solving the environment require an average total reward of over 2500 on 100 consecutive episodes.
Training of Walker2DBulletEnv is performed using the Soft Actor-Critic (SAC) algorithm, see
two basic papers SAC: Off-Policy Maximum Entropy Deep RL with a Stochastic Actor
and SAC Algorithms and Applications. We solve the HopperBulletEnv environment in 6934 episodes, in 75.5 hours.
By usage of the Twin Delayed DDPG (TD3) algorithm, the environment is solved in 9361 episodes.
...
Ep.: 6923, Total Steps: 5164722, Ep.Steps: 1000, Score: 2510.848, Avg.Score: 2434.984, Max.Score: 2544.886, Time: 75:18:42
Ep.: 6924, Total Steps: 5165722, Ep.Steps: 1000, Score: 2511.237, Avg.Score: 2458.445, Max.Score: 2544.886, Time: 75:19:39
Ep.: 6925, Total Steps: 5166722, Ep.Steps: 1000, Score: 2531.342, Avg.Score: 2458.312, Max.Score: 2544.886, Time: 75:20:32
Ep.: 6926, Total Steps: 5167722, Ep.Steps: 1000, Score: 2527.584, Avg.Score: 2458.944, Max.Score: 2544.886, Time: 75:21:24
Ep.: 6927, Total Steps: 5168722, Ep.Steps: 1000, Score: 2533.325, Avg.Score: 2459.297, Max.Score: 2544.886, Time: 75:22:16
Ep.: 6928, Total Steps: 5169722, Ep.Steps: 1000, Score: 2533.813, Avg.Score: 2459.911, Max.Score: 2544.886, Time: 75:23:08
Ep.: 6929, Total Steps: 5170722, Ep.Steps: 1000, Score: 2507.309, Avg.Score: 2459.618, Max.Score: 2544.886, Time: 75:23:59
Ep.: 6930, Total Steps: 5171722, Ep.Steps: 1000, Score: 2522.629, Avg.Score: 2462.209, Max.Score: 2544.886, Time: 75:24:52
Ep.: 6931, Total Steps: 5172722, Ep.Steps: 1000, Score: 2522.187, Avg.Score: 2480.560, Max.Score: 2544.886, Time: 75:25:44
Ep.: 6932, Total Steps: 5173722, Ep.Steps: 1000, Score: 2527.365, Avg.Score: 2495.964, Max.Score: 2544.886, Time: 75:26:36
Ep.: 6933, Total Steps: 5174722, Ep.Steps: 1000, Score: 2531.127, Avg.Score: 2516.000, Max.Score: 2544.886, Time: 75:27:28
Solved environment with Avg Score: 2516.0003478152444
See youtube videos:
Walking through the chess fields and
Chessboard chase with four Pybullet actors.
Based on Pranjal Tandon's code (https://github.com/pranz24).