You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I tried to use the parallelenv class for creating parallel episodes. I used this minigrid environment: https://github.com/maximecb/gym-minigrid/blob/master/README.md (with MiniGrid-Empty-5x5-v0) The rewards should be (1 - c*time_taken_toreachgreen) (where c is a constant), but it seems when I use the parallelenv , rewards do not follow this. I am actually observing that the rewards increase with time.
Example: Say we have 10 step episodes. Normally we should be observing this type of rewards:
[0, 0, 0.95, 0, 0, 0.9, 0, 0, 0.85, 0]
(this is a list where the first element is the reward obtained at t=0, second element is the reward at t=1, and so on. )
But, I am observing rewards like this with ParallelEnv():
[0, 0, 0.95, 0, 0, 0.95, 0, 0, 0.95, 0], or even increasing rewards like the following :
[0, 0, 0.85, 0, 0, 0.90, 0, 0, 0.95, 0]
I might be misunderstanding the purpose of the ParallelEnv class: My understanding was that it is supposed to give totally independent episodes, and it shouldn't disrupt the original reward structure? It would be great if you could let me know how I could fix this. Thank you!
The text was updated successfully, but these errors were encountered:
I have written my own code for this, I will try to push it shortly which showcases the bug. But just try the environment MiniGrid-Empty-5x5-v0, and compare the rewards within an episode with and without ParallelEnv. (Even if you use 1 environment with ParallelEnv the rewards are not correct, I am guessing something is wrong with time indexing?)
I have written my own code for this, I will try to push it shortly which showcases the bug. But just try the environment MiniGrid-Empty-5x5-v0, and compare the rewards within an episode with and without ParallelEnv. (Even if you use 1 environment with ParallelEnv the rewards are not correct, I am guessing something is wrong with time indexing?)
Hello, did you manage to fix the bug? Because I am currently trying to test the same thing
I tried to use the parallelenv class for creating parallel episodes. I used this minigrid environment: https://github.com/maximecb/gym-minigrid/blob/master/README.md (with MiniGrid-Empty-5x5-v0) The rewards should be (1 - c*time_taken_toreachgreen) (where c is a constant), but it seems when I use the parallelenv , rewards do not follow this. I am actually observing that the rewards increase with time.
Example: Say we have 10 step episodes. Normally we should be observing this type of rewards:
[0, 0, 0.95, 0, 0, 0.9, 0, 0, 0.85, 0]
(this is a list where the first element is the reward obtained at t=0, second element is the reward at t=1, and so on. )
But, I am observing rewards like this with ParallelEnv():
[0, 0, 0.95, 0, 0, 0.95, 0, 0, 0.95, 0], or even increasing rewards like the following :
[0, 0, 0.85, 0, 0, 0.90, 0, 0, 0.95, 0]
I might be misunderstanding the purpose of the ParallelEnv class: My understanding was that it is supposed to give totally independent episodes, and it shouldn't disrupt the original reward structure? It would be great if you could let me know how I could fix this. Thank you!
The text was updated successfully, but these errors were encountered: