Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[train/rllib] RLlib GPU storage context tests #39166

Merged
merged 3 commits into from
Sep 1, 2023

Conversation

krfricke
Copy link
Contributor

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@matthewdeng
Copy link
Contributor

Test succeeds here

@matthewdeng matthewdeng marked this pull request as ready for review September 1, 2023 04:03
@matthewdeng matthewdeng merged commit 8a6b4f8 into ray-project:master Sep 1, 2023
2 checks passed
matthewdeng added a commit to matthewdeng/ray that referenced this pull request Sep 1, 2023
@krfricke krfricke deleted the rllib/gpu-storage branch September 1, 2023 07:57
GeneDer pushed a commit that referenced this pull request Sep 1, 2023
…39195)

* [CI] Remove tags for AIR and AIR smoke test (#39075)

Signed-off-by: woshiyyya <[email protected]>

* [train] New persistence mode: Finish migrating `Tune tests + examples (small)` (#39047)

Signed-off-by: Justin Yu <[email protected]>

* [train] Rename `train(config)` to `train_fn(config)` (#39065)

Our API change from `tune.report()` to `train.report()` can lead to namespace clashes when the training function is called `train`.

This has been an issue in many test migrations, including #39050 and the current failure of #38493.

This PR does a global replace of all training function defined as `def train(config)` with `def train_fn(config)` to avoid future clashes.

Signed-off-by: Kai Fricke <[email protected]>

* [Data/Train] [Docs] Re-organize data loading performance tips (#39096)

Re-organize data loading performance tips. We want the caching and the CPU nodes sections to be together since they are both addressing the same problems of optimizing performance when you have expensive CPU preprocessing, and the latter references the former.

Signed-off-by: Amog Kamsetty <[email protected]>

* [air] Hard deprecate PredictorDeployment and PredictorWrapper (#39108)

Signed-off-by: Justin Yu <[email protected]>

* Update the DeepSpeed and Accelerate doc example with new Checkpoint API. (#39014)

Signed-off-by: woshiyyya <[email protected]>

* [train] New persistence mode: Remove some legacy `air.Checkpoint` dependencies (#39049)

Signed-off-by: Justin Yu <[email protected]>

* [train] Fix wandb/comet integration API calls (#38978)

Removes remaining calls to checkpoint.dir_or_data in the wandb/comet integrations

Signed-off-by: Kai Fricke <[email protected]>

* [tune] Deprecate `tune.report`, `tune.checkpoint_dir`, `checkpoint_dir`, and `reporter` (#39093)

Signed-off-by: Justin Yu <[email protected]>

* [2.7][Example] Enable new APIs for Lightning `dolly-v2-7b` Fine-tuning Example (#39117)

Signed-off-by: Yunxuan Xiao <[email protected]>
Co-authored-by: matthewdeng <[email protected]>

* [train] New persistence mode: Re-enable py37 compatibility tests (#39121)


Signed-off-by: Justin Yu <[email protected]>

* [Ray 2.7 Examples][1/n] Revamp the LightningTrainer CoLA Example (#38009)

Signed-off-by: Yunxuan Xiao <[email protected]>
Co-authored-by: matthewdeng <[email protected]>

* [train] New persistence mode: Support `chdir_to_trial_dir` functionality with `RAY_CHDIR_TO_TRIAL_DIR` env var (#39107)

Signed-off-by: Justin Yu <[email protected]>

* [train] New persistence mode: Minimal `BackendExecutor` cleanup (#39187)

Signed-off-by: Justin Yu <[email protected]>

* [train/rllib] RLlib GPU storage context tests (#39166)

Signed-off-by: Kai Fricke <[email protected]>
Co-authored-by: matthewdeng <[email protected]>

* [docs][train] Update Train landing and Overview pages (#38808)

Signed-off-by: angelinalg <[email protected]>
Co-authored-by: matthewdeng <[email protected]>

---------

Signed-off-by: woshiyyya <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Amog Kamsetty <[email protected]>
Signed-off-by: Yunxuan Xiao <[email protected]>
Signed-off-by: angelinalg <[email protected]>
Co-authored-by: Yunxuan Xiao <[email protected]>
Co-authored-by: Justin Yu <[email protected]>
Co-authored-by: Kai Fricke <[email protected]>
Co-authored-by: Amog Kamsetty <[email protected]>
Co-authored-by: angelinalg <[email protected]>
LeonLuttenberger pushed a commit to jaidisido/ray that referenced this pull request Sep 5, 2023
harborn pushed a commit to harborn/ray that referenced this pull request Sep 8, 2023
jimthompson5802 pushed a commit to jimthompson5802/ray that referenced this pull request Sep 12, 2023
Signed-off-by: Kai Fricke <[email protected]>
Co-authored-by: matthewdeng <[email protected]>
Signed-off-by: Jim Thompson <[email protected]>
vymao pushed a commit to vymao/ray that referenced this pull request Oct 11, 2023
Signed-off-by: Kai Fricke <[email protected]>
Co-authored-by: matthewdeng <[email protected]>
Signed-off-by: Victor <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants