-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib; Documentation] Some docstring cleanups; Rename RemoteVectorEnv into RemoteBaseEnv for clarity. #20250
[RLlib; Documentation] Some docstring cleanups; Rename RemoteVectorEnv into RemoteBaseEnv for clarity. #20250
Conversation
…_more_docstring_cleanups
…_more_docstring_cleanups
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 question and 2 minor comments. should be quick.
while not ready: | ||
ready, _ = ray.wait( | ||
list(self.pending), | ||
num_returns=len(self.pending), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to be waiting for everything to be ready?
looking at the code below, I think we don't need the while loop here. everything is assuming ready will have the full list of obj_ref in one shot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's true. In my understanding, ray.wait returns a tuple: 1) list of ready-handles and 2) list of non-ready handles. As soon as there is at least one handle ready, we continue and return observations from only those ready ray.remote sub-environments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, yeah, that's the self.pending thing I guess. let me double check the semantics for num_wait.
In any case though, I think we should get rid of the while loop and the time_out parameter, and this is the same thing right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok I verified a couple of things and understand this completely now :)
so normally num_returns=len(self.pending) would wait until all pending actors come back, however we also have timeout=self.poll_timeout there ... so effectively the logic here is:
- we wait self.poll_timeout, if all workers come back with a batch in time, great, and continue.
- if some workers are slow, and don't come back in poll_timeout, we leave them, and continue, so we don't wait more than poll_timeout interval.
- if all workers are slow, and nothing comes back in poll_timeout though, we will ignore poll_timeout and keep waiting until at least something comes back, because of the while loop. this may or may not be an intended behavior actually, for off policy cases?
I think as we discussed offline, let's just comment about this behavior and leave it as is for now. this sounds like a performance impacting change to me, and we should do it by itself I guess.
I am also thinking if we are ok with getting data back from some of the workers, then we don't really need to have the users specify this parameter, which will be 1 fewer confusing parameter for the users :)
but, yeah, we can come back to this. thanks :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1., 2., and 3.: Yeah, that's how I understood it now, too. So removing the timeout would actually alter the logic here.
else: | ||
ob = {_DUMMY_AGENT_ID: ret} | ||
|
||
if rew is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this a fix for a specific env/version or a general safety net for user's custom envs?
feel like this may be concealing bugs if they forgot to return reward, for example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great question. It's when we have a return value from reset()
(rather than step()
). Reset does not return rewards (only 1 return value; step=4 return values).
Added a comment explaining this.
rllib/env/remote_base_env.py
Outdated
self.poll_timeout = remote_env_batch_wait_ms / 1000 | ||
|
||
self.actors = None # lazy init | ||
self.pending = None # lazy init |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any chance you can explain the role of self.pending a bit here?
I found it to be quite important, carrying logic between member functions actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment.
…_more_docstring_cleanups
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great, thanks for doing this man!!
Why are these changes needed?
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.