Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[tune] Deprecate tune.report, tune.checkpoint_dir, checkpoint_dir, and reporter #39093

Merged
merged 16 commits into from
Aug 31, 2023

Conversation

justinvyu
Copy link
Contributor

@justinvyu justinvyu commented Aug 30, 2023

Why are these changes needed?

In 2.7, we’re unifying some key APIs and implementations to be in ray.train, and this has some impact on legacy Tune APIs that have either been deprecated or not recommended for a long time.

Before 2.0, but also supported through 2.6:

def train_fn(config, reporter, checkpoint_dir: str = None):
    if checkpoint_dir:  # [1]
        torch.load(...)
    reporter(loss=1, accuracy=0)  # [2]

    with tune.checkpoint_dir(step=0) as checkpoint_dir:  # [3]
         torch.save(...)

    tune.report(loss=1, accuracy=0)  # [4] -- same as [2]

results = tune.Tuner(train_fn).fit()

API for 2.7+:

def train_fn(config):
    checkpoint: train.Checkpoint = ray.train.get_checkpoint()  # [1]
    if checkpoint:
        with checkpoint.as_directory() as checkpoint_dir:
            torch.load(...)

    with tempfile.TemporaryDirectory() as temp_checkpoint_dir:
        torch.save(...)
        ray.train.report(
            {"loss": 1, "accuracy": 0},  # [2, 3]
            checkpoint=Checkpoint.from_directory(temp_checkpoint_dir)  # [4]
        )

results = tune.Tuner(train_fn).fit()

Related issue number

Closes #39035

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Copy link
Contributor

@krfricke krfricke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice.

Are we planning to land this for 2.7?

@justinvyu
Copy link
Contributor Author

Yep, going to pick for 2.7

@matthewdeng matthewdeng added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Aug 31, 2023
Copy link
Contributor

@matthewdeng matthewdeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really awesome, thank you for cleaning this up!

@matthewdeng matthewdeng merged commit 9ed2fa2 into ray-project:master Aug 31, 2023
61 of 68 checks passed
matthewdeng pushed a commit to matthewdeng/ray that referenced this pull request Sep 1, 2023
GeneDer pushed a commit that referenced this pull request Sep 1, 2023
…39195)

* [CI] Remove tags for AIR and AIR smoke test (#39075)

Signed-off-by: woshiyyya <[email protected]>

* [train] New persistence mode: Finish migrating `Tune tests + examples (small)` (#39047)

Signed-off-by: Justin Yu <[email protected]>

* [train] Rename `train(config)` to `train_fn(config)` (#39065)

Our API change from `tune.report()` to `train.report()` can lead to namespace clashes when the training function is called `train`.

This has been an issue in many test migrations, including #39050 and the current failure of #38493.

This PR does a global replace of all training function defined as `def train(config)` with `def train_fn(config)` to avoid future clashes.

Signed-off-by: Kai Fricke <[email protected]>

* [Data/Train] [Docs] Re-organize data loading performance tips (#39096)

Re-organize data loading performance tips. We want the caching and the CPU nodes sections to be together since they are both addressing the same problems of optimizing performance when you have expensive CPU preprocessing, and the latter references the former.

Signed-off-by: Amog Kamsetty <[email protected]>

* [air] Hard deprecate PredictorDeployment and PredictorWrapper (#39108)

Signed-off-by: Justin Yu <[email protected]>

* Update the DeepSpeed and Accelerate doc example with new Checkpoint API. (#39014)

Signed-off-by: woshiyyya <[email protected]>

* [train] New persistence mode: Remove some legacy `air.Checkpoint` dependencies (#39049)

Signed-off-by: Justin Yu <[email protected]>

* [train] Fix wandb/comet integration API calls (#38978)

Removes remaining calls to checkpoint.dir_or_data in the wandb/comet integrations

Signed-off-by: Kai Fricke <[email protected]>

* [tune] Deprecate `tune.report`, `tune.checkpoint_dir`, `checkpoint_dir`, and `reporter` (#39093)

Signed-off-by: Justin Yu <[email protected]>

* [2.7][Example] Enable new APIs for Lightning `dolly-v2-7b` Fine-tuning Example (#39117)

Signed-off-by: Yunxuan Xiao <[email protected]>
Co-authored-by: matthewdeng <[email protected]>

* [train] New persistence mode: Re-enable py37 compatibility tests (#39121)


Signed-off-by: Justin Yu <[email protected]>

* [Ray 2.7 Examples][1/n] Revamp the LightningTrainer CoLA Example (#38009)

Signed-off-by: Yunxuan Xiao <[email protected]>
Co-authored-by: matthewdeng <[email protected]>

* [train] New persistence mode: Support `chdir_to_trial_dir` functionality with `RAY_CHDIR_TO_TRIAL_DIR` env var (#39107)

Signed-off-by: Justin Yu <[email protected]>

* [train] New persistence mode: Minimal `BackendExecutor` cleanup (#39187)

Signed-off-by: Justin Yu <[email protected]>

* [train/rllib] RLlib GPU storage context tests (#39166)

Signed-off-by: Kai Fricke <[email protected]>
Co-authored-by: matthewdeng <[email protected]>

* [docs][train] Update Train landing and Overview pages (#38808)

Signed-off-by: angelinalg <[email protected]>
Co-authored-by: matthewdeng <[email protected]>

---------

Signed-off-by: woshiyyya <[email protected]>
Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Kai Fricke <[email protected]>
Signed-off-by: Amog Kamsetty <[email protected]>
Signed-off-by: Yunxuan Xiao <[email protected]>
Signed-off-by: angelinalg <[email protected]>
Co-authored-by: Yunxuan Xiao <[email protected]>
Co-authored-by: Justin Yu <[email protected]>
Co-authored-by: Kai Fricke <[email protected]>
Co-authored-by: Amog Kamsetty <[email protected]>
Co-authored-by: angelinalg <[email protected]>
LeonLuttenberger pushed a commit to jaidisido/ray that referenced this pull request Sep 5, 2023
harborn pushed a commit to harborn/ray that referenced this pull request Sep 8, 2023
jimthompson5802 pushed a commit to jimthompson5802/ray that referenced this pull request Sep 12, 2023
…r`, and `reporter` (ray-project#39093)

Signed-off-by: Justin Yu <[email protected]>
Signed-off-by: Jim Thompson <[email protected]>
vymao pushed a commit to vymao/ray that referenced this pull request Oct 11, 2023
@justinvyu justinvyu deleted the deprecate_tune_report branch December 14, 2023 22:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[train][GA] Hard deprecate tune.report / tune.checkpoint_dir
4 participants