Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ParallelEnv class yields non-correct rewards in a minigrid environment #3

Open
ycemsubakan opened this issue Dec 30, 2019 · 3 comments

Comments

@ycemsubakan
Copy link

I tried to use the parallelenv class for creating parallel episodes. I used this minigrid environment: https://github.com/maximecb/gym-minigrid/blob/master/README.md (with MiniGrid-Empty-5x5-v0) The rewards should be (1 - c*time_taken_toreachgreen) (where c is a constant), but it seems when I use the parallelenv , rewards do not follow this. I am actually observing that the rewards increase with time.
Example: Say we have 10 step episodes. Normally we should be observing this type of rewards:
[0, 0, 0.95, 0, 0, 0.9, 0, 0, 0.85, 0]
(this is a list where the first element is the reward obtained at t=0, second element is the reward at t=1, and so on. )
But, I am observing rewards like this with ParallelEnv():
[0, 0, 0.95, 0, 0, 0.95, 0, 0, 0.95, 0], or even increasing rewards like the following :
[0, 0, 0.85, 0, 0, 0.90, 0, 0, 0.95, 0]

I might be misunderstanding the purpose of the ParallelEnv class: My understanding was that it is supposed to give totally independent episodes, and it shouldn't disrupt the original reward structure? It would be great if you could let me know how I could fix this. Thank you!

@lcswillems
Copy link
Owner

ParallelEnv just runs agent on environments in parallel. I don't see the link with reward. Could you say me how to reproduce the bug?

@ycemsubakan
Copy link
Author

I have written my own code for this, I will try to push it shortly which showcases the bug. But just try the environment MiniGrid-Empty-5x5-v0, and compare the rewards within an episode with and without ParallelEnv. (Even if you use 1 environment with ParallelEnv the rewards are not correct, I am guessing something is wrong with time indexing?)

@ayakayal
Copy link

ayakayal commented Apr 6, 2023

I have written my own code for this, I will try to push it shortly which showcases the bug. But just try the environment MiniGrid-Empty-5x5-v0, and compare the rewards within an episode with and without ParallelEnv. (Even if you use 1 environment with ParallelEnv the rewards are not correct, I am guessing something is wrong with time indexing?)

Hello, did you manage to fix the bug? Because I am currently trying to test the same thing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants