Pendulum-v0 doesnt converge #6

rapop · 2019-05-03T20:54:27Z

Hello,

I have just launched your code with Pendulum-v0 and it doesnt train. The mean reward stays at -1400 till the end.

Did you use different hyperparameters to solve this environment?

Thanks

takuseno · 2019-05-05T13:34:14Z

Hi, @rapop !
Thank you for trying my code!

These are results by running with default configuration twice.

Since PPO is an on-policy method, it is sometimes unstable to produce the same results. Though this repository is based on the original paper, the latest official implementation uses an additional stability method for the value function, which is not described in the paper. And, I found this will stabilize training in my other PPO implementation. If you are interested in the more stable implementation, please see my latest codes.

Thank you.

takuseno closed this as completed Feb 23, 2020

Provide feedback