Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pendulum-v0 doesnt converge #6

Closed
rapop opened this issue May 3, 2019 · 1 comment
Closed

Pendulum-v0 doesnt converge #6

rapop opened this issue May 3, 2019 · 1 comment

Comments

@rapop
Copy link

rapop commented May 3, 2019

Hello,

I have just launched your code with Pendulum-v0 and it doesnt train. The mean reward stays at -1400 till the end.

Did you use different hyperparameters to solve this environment?

Thanks

@takuseno
Copy link
Owner

takuseno commented May 5, 2019

Hi, @rapop !
Thank you for trying my code!

These are results by running with default configuration twice.
image

Since PPO is an on-policy method, it is sometimes unstable to produce the same results. Though this repository is based on the original paper, the latest official implementation uses an additional stability method for the value function, which is not described in the paper. And, I found this will stabilize training in my other PPO implementation. If you are interested in the more stable implementation, please see my latest codes.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants