-
Notifications
You must be signed in to change notification settings - Fork 727
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[bug] PPO2 episode reward summaries are written incorrectly for VecEnvs #143
Comments
Hello,
|
I can confirm that it works with just one env. The relevant code is in total_episode_reward_logger which is called by PPO2 here and . To me it is absolutely unclear what |
The root cause of this problem can be related to this:
|
@balintkozma |
I created the PR, meanwhile I found another problem: If a shorter episode is added to the episode reward summary after a longer one, the graph will go backwards, because tensorboard connects the dots in the order they are added, and the calculated step counter will be smaller. So, the curves on the zoomed-in picture can be avoided by sorting the finished episodes by length before they are added to the tensorboard summary. Not implemented yet, I will create a separete PR. |
Please do only one PR that solves this issue. |
Thanks for the quick reply! I think that could also be fixed in the same PR, as these two are relate-... Ninj'd by Arrafin |
There are much more issues with the timestep computation than just the call to total_episode_reward_logger. All other plots are also wrong when running a VecEnv. I do not understand the computation of the current timestep:
Am I missing something? To me the only requirement to the timestep computation is that the values are plotted in the same order as they were computed. I already fixed this for my own use and would make a pull request if appreciated. |
If you have a solution that does not change too many parts at once, go ahead and make a PR out of it :). If it is a large change it might need time/discussion before merge, as we (try) to focus on v3.0 at the moment. |
Looking at the issue again, the computation of |
Hi, I also encountered some issues described in the comments above. A recap follows. PPO2 tensorboard visualization issuesIf you run ppo2 with a single process training for 256 timesteps (
Moreover, if you try to plot data using multiple processes (for instance
PPO tensorboard visualization proposed solutionI implemented the following solutions for the visualization issues:
As a result, in the showcases above:
The modifications are just a few and straightforward. Regarding the side-by-side visualization of the rewards in the multiprocess case, do you believe that plotting the mean and variance of the collected data would instead be more appropriate? If it is appreciated, I would open a PR with the implemented modifications, which I can update if the mean and variance solution is recommended. |
@paolo-viceconte thanks, I'll try to take a look at what you did this week (unless @Miffyli can do it before), we have too many issue related to that function (cf all linked issues). |
@Capitolhill As a quick fix, I suggest trying out stable-baselines3 which also has tensorboard support and is more actively maintained. Migration from SB2 is mostly as simple as replacing |
Thanks for the response. I am assuming the tensorboard logging issue for multiple-envs. has been resolved in SB3. |
For a more complete answer, you can use the "legacy" logging in SB2 (cf doc) but it requires the use of the |
Episode reward summaries are all concentrated together on a few steps, with jumps in between.
Zoomed out:
![image](https://user-images.githubusercontent.com/1750835/50369978-20aace00-0553-11e9-91a8-334ca4f405c4.png)
Zoomed in:
![image](https://user-images.githubusercontent.com/1750835/50370046-6ddb6f80-0554-11e9-914f-8b2b8c45270f.png)
Every other summary looks fine:
![image](https://user-images.githubusercontent.com/1750835/50370120-b9dae400-0555-11e9-98b6-92badee5c622.png)
To reproduce, run PPO2 on
DummyVecEnv(["Pendulum-v0" for _ in range(8)])
.The text was updated successfully, but these errors were encountered: