Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[KubeRay][Autoscaler][Core] Add a flag to disable ray status version check #26584

Merged

Conversation

DmitriGekhtman
Copy link
Contributor

@DmitriGekhtman DmitriGekhtman commented Jul 14, 2022

Why are these changes needed?

ray health-check checks if the local Ray version agrees with the GCS Ray version,
which is a reasonable thing to do,
unless you're deliberately mismatching Ray versions.

During this development phase of the KubeRay autoscaler integration, we are deliberately mismatching Ray and autoscaler versions. As a result, users have encountered trouble starting up the autoscaler.

Thus, we need a way of disabling the version check.
This PR introduces a flag for that.

This PR also has ray health-check print error output in a couple of spots where it previously did not, for ease of debugging.

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Manual check
      I'll manually check compatibility of autoscaler with these changes against Ray 1.11.0

@DmitriGekhtman DmitriGekhtman changed the title [KubeRay][Autoscaler][Core] [KubeRay][Autoscaler][Core] Add a flag to disable ray status version check Jul 14, 2022
Signed-off-by: Dmitri Gekhtman <[email protected]>
Signed-off-by: Dmitri Gekhtman <[email protected]>
Signed-off-by: Dmitri Gekhtman <[email protected]>
Copy link
Contributor

@tbabej tbabej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@fishbone
Copy link
Contributor

We actually introduced a light weighted health check for GCS in this PR

Do you think we should update the kube ray one to use the healthz?

@DmitriGekhtman
Copy link
Contributor Author

We actually introduced a light weighted health check for GCS in this PR

Do you think we should update the kube ray one to use the healthz?

Ah, I see. Thanks for the context. Maybe we can actually update "ray health-check" to use the healthz? What do you think?

@fishbone
Copy link
Contributor

fishbone commented Jul 14, 2022

One down side for the healthz is that it needs dashboard to be alive to use this (maybe not good for autoscaler). The good thing about ray health-check is that it has no dependence.

But the cons for ray health-check is that it always need to start a gRPC connection, which is slightly heavier. And also, it needs ray to be installed which is very bad for some cases.

I feel each has its use case. Maybe we should unify their health check code in some way in the future.

I have no strong opinion here.

Signed-off-by: Dmitri Gekhtman <[email protected]>
Signed-off-by: Dmitri Gekhtman <[email protected]>
@DmitriGekhtman
Copy link
Contributor Author

Confirming that this works as expected. ML test failures are unrelated.

Will merge, wait for the new image to build, and set that as the default in KubeRay.

@DmitriGekhtman DmitriGekhtman merged commit a304d1c into ray-project:master Jul 15, 2022
scv119 added a commit that referenced this pull request Jul 15, 2022
scv119 added a commit that referenced this pull request Jul 15, 2022
…version check (#26584)" (#26597)

Reverts #26584

Seems it breaking test_advanced_4
xwjiang2010 pushed a commit to xwjiang2010/ray that referenced this pull request Jul 19, 2022
…check (ray-project#26584)

Adds a flag that disables the version check in ray health-check.

Signed-off-by: Xiaowei Jiang <[email protected]>
xwjiang2010 pushed a commit to xwjiang2010/ray that referenced this pull request Jul 19, 2022
…version check (ray-project#26584)" (ray-project#26597)

Reverts ray-project#26584

Seems it breaking test_advanced_4

Signed-off-by: Xiaowei Jiang <[email protected]>
DmitriGekhtman added a commit to DmitriGekhtman/ray that referenced this pull request Jul 19, 2022
… status version check (ray-project#26584)" (ray-project#26597)"

This reverts commit fe9a12a.

Signed-off-by: Dmitri Gekhtman <[email protected]>
DmitriGekhtman added a commit that referenced this pull request Jul 19, 2022
* Revert "Revert "[KubeRay][Autoscaler][Core] Add a flag to disable ray status version check (#26584)" (#26597)"

(Fixes test issue.)
Rohan138 pushed a commit to Rohan138/ray that referenced this pull request Jul 28, 2022
* Revert "Revert "[KubeRay][Autoscaler][Core] Add a flag to disable ray status version check (ray-project#26584)" (ray-project#26597)"

(Fixes test issue.)

Signed-off-by: Rohan138 <[email protected]>
Stefan-1313 pushed a commit to Stefan-1313/ray_mod that referenced this pull request Aug 18, 2022
…check (ray-project#26584)

Adds a flag that disables the version check in ray health-check.

Signed-off-by: Stefan van der Kleij <[email protected]>
Stefan-1313 pushed a commit to Stefan-1313/ray_mod that referenced this pull request Aug 18, 2022
…version check (ray-project#26584)" (ray-project#26597)

Reverts ray-project#26584

Seems it breaking test_advanced_4

Signed-off-by: Stefan van der Kleij <[email protected]>
Stefan-1313 pushed a commit to Stefan-1313/ray_mod that referenced this pull request Aug 18, 2022
* Revert "Revert "[KubeRay][Autoscaler][Core] Add a flag to disable ray status version check (ray-project#26584)" (ray-project#26597)"

(Fixes test issue.)

Signed-off-by: Stefan van der Kleij <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants