ParallelEnv class yields non-correct rewards in a minigrid environment #3

ycemsubakan · 2019-12-30T22:21:25Z

I tried to use the parallelenv class for creating parallel episodes. I used this minigrid environment: https://github.com/maximecb/gym-minigrid/blob/master/README.md (with MiniGrid-Empty-5x5-v0) The rewards should be (1 - c*time_taken_toreachgreen) (where c is a constant), but it seems when I use the parallelenv , rewards do not follow this. I am actually observing that the rewards increase with time.
Example: Say we have 10 step episodes. Normally we should be observing this type of rewards:
[0, 0, 0.95, 0, 0, 0.9, 0, 0, 0.85, 0]
(this is a list where the first element is the reward obtained at t=0, second element is the reward at t=1, and so on. )
But, I am observing rewards like this with ParallelEnv():
[0, 0, 0.95, 0, 0, 0.95, 0, 0, 0.95, 0], or even increasing rewards like the following :
[0, 0, 0.85, 0, 0, 0.90, 0, 0, 0.95, 0]

I might be misunderstanding the purpose of the ParallelEnv class: My understanding was that it is supposed to give totally independent episodes, and it shouldn't disrupt the original reward structure? It would be great if you could let me know how I could fix this. Thank you!

lcswillems · 2019-12-30T22:41:56Z

ParallelEnv just runs agent on environments in parallel. I don't see the link with reward. Could you say me how to reproduce the bug?

ycemsubakan · 2019-12-30T23:01:54Z

I have written my own code for this, I will try to push it shortly which showcases the bug. But just try the environment MiniGrid-Empty-5x5-v0, and compare the rewards within an episode with and without ParallelEnv. (Even if you use 1 environment with ParallelEnv the rewards are not correct, I am guessing something is wrong with time indexing?)

ayakayal · 2023-04-06T10:25:12Z

I have written my own code for this, I will try to push it shortly which showcases the bug. But just try the environment MiniGrid-Empty-5x5-v0, and compare the rewards within an episode with and without ParallelEnv. (Even if you use 1 environment with ParallelEnv the rewards are not correct, I am guessing something is wrong with time indexing?)

Hello, did you manage to fix the bug? Because I am currently trying to test the same thing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ParallelEnv class yields non-correct rewards in a minigrid environment #3

ParallelEnv class yields non-correct rewards in a minigrid environment #3

ycemsubakan commented Dec 30, 2019

lcswillems commented Dec 30, 2019

ycemsubakan commented Dec 30, 2019

ayakayal commented Apr 6, 2023

ParallelEnv class yields non-correct rewards in a minigrid environment #3

ParallelEnv class yields non-correct rewards in a minigrid environment #3

Comments

ycemsubakan commented Dec 30, 2019

lcswillems commented Dec 30, 2019

ycemsubakan commented Dec 30, 2019

ayakayal commented Apr 6, 2023