Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib][early-kickoff] RNNs and RLModules #32723

Merged
merged 469 commits into from
Jun 28, 2023

Conversation

ArturNiederfahrenhorst
Copy link
Contributor

@ArturNiederfahrenhorst ArturNiederfahrenhorst commented Feb 21, 2023

Note: This PR started as a fork of #30997 to experiment with how we incorporate RNNs into RLModules.

Why are these changes needed?

This PR makes stateful RLModules compatible with our legacy sampling stack.
Most notably, this includes the following changes:

  • SampleBatches in the RNN case are pushed out of the EnvRunner with tensors in [B, T, ...] format.
  • We add Tokenizers. Tokenizers are simply Encoders that live inside an LSTM encoder or GRU encoder.
  • RLModules update Policy's view requirements (similar to what Models did before).
  • RLModules get a is_recurrent method.
  • A method auto_fold_unfold_time that our heads use under the hood to automatically reshape inputs that have a time dimension before pushing them through the model and reshaping them back to add back the time dimension afterwards.

Additionally, this PR deals a lot with our new place for states "state_out" vs the old "state_out_..." and makes places distinguish between the RLModules and non-RLModule case.

Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
@kouroshHakha kouroshHakha changed the title [RLlib][early-kickoff] RNNs and RLModules [RLlib][early_kickoff] RNNs and RLModules Jun 27, 2023
kouroshHakha and others added 13 commits June 27, 2023 10:59
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: kourosh hakhamaneshi <[email protected]>
Signed-off-by: kourosh hakhamaneshi <[email protected]>
@kouroshHakha
Copy link
Contributor

So there are two quality issues with this PR which I am intentionally postponing to another follow-up / clean up PR after the branch cut for 2.6:

  1. Having the auto_fold_unfold_time decorator depend on the spec checking module is not the cleanest solution. Also there is a lot of reduntant code between tf / torch which would make maintenance a huge pain. @ArturNiederfahrenhorst We should come back and fix this issue
  2. The overriding of v and r values in compute_advantages does not sound right. In my opinion the caller should write the correct v and r inside the SampleBatch already. We should investigate that further.

@kouroshHakha kouroshHakha changed the title [RLlib][early_kickoff] RNNs and RLModules [RLlib][early-kickoff] RNNs and RLModules Jun 27, 2023
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
@kouroshHakha kouroshHakha merged commit 960032a into ray-project:master Jun 28, 2023
2 checks passed
SongGuyang pushed a commit to alipay/ant-ray that referenced this pull request Jul 12, 2023
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Co-authored-by: Kourosh Hakhamaneshi <[email protected]>
Co-authored-by:  Artur Niederfahrenhorst <[email protected]>
Signed-off-by: 久龙 <[email protected]>
harborn pushed a commit to harborn/ray that referenced this pull request Aug 17, 2023
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Co-authored-by: Kourosh Hakhamaneshi <[email protected]>
Co-authored-by:  Artur Niederfahrenhorst <[email protected]>
Signed-off-by: harborn <[email protected]>
harborn pushed a commit to harborn/ray that referenced this pull request Aug 17, 2023
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Co-authored-by: Kourosh Hakhamaneshi <[email protected]>
Co-authored-by:  Artur Niederfahrenhorst <[email protected]>
arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023
Signed-off-by: Artur Niederfahrenhorst <[email protected]>
Signed-off-by: Kourosh Hakhamaneshi <[email protected]>
Co-authored-by: Kourosh Hakhamaneshi <[email protected]>
Co-authored-by:  Artur Niederfahrenhorst <[email protected]>
Signed-off-by: e428265 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants