Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib; Documentation] Some docstring cleanups; Rename RemoteVectorEnv into RemoteBaseEnv for clarity. #20250

Merged
merged 10 commits into from
Nov 17, 2021

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Nov 11, 2021

  • Some docstring cleanups
  • Soft-Rename RemoteVectorEnv into RemoteBaseEnv for clarity.

Why are these changes needed?

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@sven1977 sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Nov 12, 2021
Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 question and 2 minor comments. should be quick.

while not ready:
ready, _ = ray.wait(
list(self.pending),
num_returns=len(self.pending),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems to be waiting for everything to be ready?
looking at the code below, I think we don't need the while loop here. everything is assuming ready will have the full list of obj_ref in one shot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's true. In my understanding, ray.wait returns a tuple: 1) list of ready-handles and 2) list of non-ready handles. As soon as there is at least one handle ready, we continue and return observations from only those ready ray.remote sub-environments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, yeah, that's the self.pending thing I guess. let me double check the semantics for num_wait.

In any case though, I think we should get rid of the while loop and the time_out parameter, and this is the same thing right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok I verified a couple of things and understand this completely now :)
so normally num_returns=len(self.pending) would wait until all pending actors come back, however we also have timeout=self.poll_timeout there ... so effectively the logic here is:

  1. we wait self.poll_timeout, if all workers come back with a batch in time, great, and continue.
  2. if some workers are slow, and don't come back in poll_timeout, we leave them, and continue, so we don't wait more than poll_timeout interval.
  3. if all workers are slow, and nothing comes back in poll_timeout though, we will ignore poll_timeout and keep waiting until at least something comes back, because of the while loop. this may or may not be an intended behavior actually, for off policy cases?

I think as we discussed offline, let's just comment about this behavior and leave it as is for now. this sounds like a performance impacting change to me, and we should do it by itself I guess.
I am also thinking if we are ok with getting data back from some of the workers, then we don't really need to have the users specify this parameter, which will be 1 fewer confusing parameter for the users :)

but, yeah, we can come back to this. thanks :)

Copy link
Contributor Author

@sven1977 sven1977 Nov 17, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1., 2., and 3.: Yeah, that's how I understood it now, too. So removing the timeout would actually alter the logic here.

else:
ob = {_DUMMY_AGENT_ID: ret}

if rew is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this a fix for a specific env/version or a general safety net for user's custom envs?
feel like this may be concealing bugs if they forgot to return reward, for example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. It's when we have a return value from reset() (rather than step()). Reset does not return rewards (only 1 return value; step=4 return values).
Added a comment explaining this.

self.poll_timeout = remote_env_batch_wait_ms / 1000

self.actors = None # lazy init
self.pending = None # lazy init
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any chance you can explain the role of self.pending a bit here?
I found it to be quite important, carrying logic between member functions actually.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a comment.

Copy link
Member

@gjoliver gjoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks great, thanks for doing this man!!

@sven1977 sven1977 merged commit 56619b9 into ray-project:master Nov 17, 2021
@sven1977 sven1977 deleted the docs_more_docstring_cleanups branch June 2, 2023 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants