Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib] Enable eager_tracing=True by default. #36556

Merged

Conversation

sven1977
Copy link
Contributor

@sven1977 sven1977 commented Jun 19, 2023

Enable eager_tracing=True by default.

When running Algorithms with framework=="tf2", it is considerably slower to do so with the old default eager_tracing=False setting. This PR switches the default setting from eager_tracing=False to True, RLlib-wide.

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
self.assertGreaterEqual(coeff, 0.0001)
else:
self.assertLessEqual(coeff, 0.01)
self.assertGreaterEqual(coeff, 0.001)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we redesign this test a little bit?

It could be simpler for example by doing:

entropy_coeff_schedule=[[0, 0.1], [200, 0.001], [600, 0.0001]]

Also, _step_n_times() should be "step_until_n_steps_reached()".
We should then be able to reuse this with entropy coefficient tests for other algorithms if so desired.
The "~100 timesteps" thing can easily change per algorithm or when something else in the algorithm under test changes that has nothing to do with the coefficient schedule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to fix this, yes. I think b/c eager tracing is much faster, the async sampling also runs faster in the background.
I will add a proper check here to make sure this test performs the right checks based on the actual timesteps sampled.

and not exp["config"].get("eager_tracing") is False
):

exp["config"]["eager_tracing"] = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we actually make this so that when local mode is True for the regression tests script (we use this only for debugging, right?), eager tracing will be False?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in almost all cases we are happy that eager tracing is disabled when using local mode here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, but I feel like then you should just set it in your config, no? It's not good to change stuff b/c we assume something w/o the user being in control. I have run into this issue several times while debugging, thinking that eager tracing was True (I wanted to debug a bug that only happened for eager_tracing=True), when it wasn't b/c I was also using local mode. It took me a while to find out that RLlib had sneakily changed my config :)

Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
@sven1977 sven1977 added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Jun 20, 2023
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
Signed-off-by: sven1977 <[email protected]>
@sven1977 sven1977 merged commit a3ec4a9 into ray-project:master Jun 20, 2023
13 of 42 checks passed
vitsai pushed a commit to vitsai/ray that referenced this pull request Jun 21, 2023
SongGuyang pushed a commit to alipay/ant-ray that referenced this pull request Jul 12, 2023
harborn pushed a commit to harborn/ray that referenced this pull request Aug 17, 2023
harborn pushed a commit to harborn/ray that referenced this pull request Aug 17, 2023
arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants