Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[release] update torch_tune_serve_test to use anyscale connect #16754

Merged
merged 4 commits into from
Jul 7, 2021

Conversation

matthewdeng
Copy link
Contributor

Why are these changes needed?

As part of the Release Test process, these tests are being migrated to use the Anyscale connect API.

This Golden Notebook test was originally added in #16619.

Note

get_best_model is wrapped in ray.remote so that the checkpoint is loaded from the head node.

Related issue number

n/a

Checks

e2e job

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

@@ -186,7 +189,8 @@ def test_predictions(test_mode=False):

print("Retrieving best model.")
best_checkpoint = analysis.best_checkpoint
model_id = get_best_model(best_checkpoint)
get_best_model_remote = ray.remote(get_best_model)
Copy link
Contributor

@amogkam amogkam Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to test out the ray.client().download_results() workflow here instead of wrapping in a task?

Copy link
Contributor

@amogkam amogkam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Is this working with the release tool?

Copy link
Contributor

@krfricke krfricke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I was just wondering if this script also works with Ray OSS.

model_state = torch.load(best_model_checkpoint_path)
def get_remote_model(remote_model_checkpoint_path):
address = os.environ.get("RAY_ADDRESS")
if address is not None and address.startswith("anyscale:https://"):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this also work with the Ray OSS client?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, for OSS this would go to the else block and use ray.remote to fetch the model. I updated the code with some better path isolation and successfully ran this script on a Ray cluster!

@amogkam amogkam merged commit 23088bd into ray-project:master Jul 7, 2021
@matthewdeng matthewdeng deleted the torch-tune-serve branch July 7, 2021 02:14
jiaodong pushed a commit that referenced this pull request Jul 11, 2021
* [release] update torch_tune_serve_test to use anyscale connect

* use download_results to download model checkpoint

* clean up code to support both OSS and Anyscale
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants