Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core][autoscaler] GCS Autoscaler V2: Support autoscaler.sdk.request_resources [5/x] #35846

Merged
merged 5 commits into from
Jun 7, 2023

Conversation

rickyyx
Copy link
Contributor

@rickyyx rickyyx commented May 27, 2023

Why are these changes needed?

This PR adds a RPC call RequestClusterResources such that autoscaker.sdk.request_resources will pass the requested resources to GCS directly, which will be passed to autoscaler through GetClusterResourcesState RPC.

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Copy link
Contributor Author

@rickyyx rickyyx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is more of a prototype - putting it here early for feedback since I think the current protobuf for cluster resource constraints is not adhering with current autoscaler.sdk.request_resources's semantics.

  1. There's no per job level resource constraint
  2. The request resources could be as simple as a list of ResourceRequest

src/ray/gcs/gcs_server/gcs_resource_manager.h Outdated Show resolved Hide resolved
src/ray/gcs/gcs_server/gcs_resource_manager.h Outdated Show resolved Hide resolved
src/ray/protobuf/experimental/autoscaler.proto Outdated Show resolved Hide resolved
@scv119
Copy link
Contributor

scv119 commented May 30, 2023

as we discussed, let's move ahead with the current implementation and defer the per job design later.

Signed-off-by: Ricky Xu <[email protected]>
@rickyyx rickyyx force-pushed the pr-auto-request-resources branch from ac17e03 to f7f33cd Compare June 2, 2023 07:07
Signed-off-by: Ricky Xu <[email protected]>
@rickyyx rickyyx marked this pull request as ready for review June 2, 2023 08:39
Signed-off-by: Ricky Xu <[email protected]>
Copy link
Collaborator

@jjyao jjyao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Signed-off-by: Ricky Xu <[email protected]>
Signed-off-by: Ricky Xu <[email protected]>
@rickyyx rickyyx added the tests-ok The tagger certifies test failures are unrelated and assumes personal liability. label Jun 7, 2023
@scv119 scv119 merged commit 3d3da03 into ray-project:master Jun 7, 2023
arvind-chandra pushed a commit to lmco/ray that referenced this pull request Aug 31, 2023
…resources [5/x] (ray-project#35846)

This PR adds a RPC call RequestClusterResources such that autoscaker.sdk.request_resources will pass the requested resources to GCS directly, which will be passed to autoscaler through GetClusterResourcesState RPC.

Signed-off-by: e428265 <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests-ok The tagger certifies test failures are unrelated and assumes personal liability.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants