-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[KubeRay][Autoscaler][Core] Add a flag to disable ray status version check #26584
[KubeRay][Autoscaler][Core] Add a flag to disable ray status version check #26584
Conversation
f190c9d
to
d23bee4
Compare
Signed-off-by: Dmitri Gekhtman <[email protected]>
Signed-off-by: Dmitri Gekhtman <[email protected]>
Signed-off-by: Dmitri Gekhtman <[email protected]>
d1c1668
to
ed8e0e0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
We actually introduced a light weighted health check for GCS in this PR Do you think we should update the kube ray one to use the healthz? |
Ah, I see. Thanks for the context. Maybe we can actually update "ray health-check" to use the healthz? What do you think? |
One down side for the healthz is that it needs dashboard to be alive to use this (maybe not good for autoscaler). The good thing about But the cons for I feel each has its use case. Maybe we should unify their health check code in some way in the future. I have no strong opinion here. |
Signed-off-by: Dmitri Gekhtman <[email protected]>
Signed-off-by: Dmitri Gekhtman <[email protected]>
Confirming that this works as expected. ML test failures are unrelated. Will merge, wait for the new image to build, and set that as the default in KubeRay. |
…check (ray-project#26584) Adds a flag that disables the version check in ray health-check. Signed-off-by: Xiaowei Jiang <[email protected]>
…version check (ray-project#26584)" (ray-project#26597) Reverts ray-project#26584 Seems it breaking test_advanced_4 Signed-off-by: Xiaowei Jiang <[email protected]>
… status version check (ray-project#26584)" (ray-project#26597)" This reverts commit fe9a12a. Signed-off-by: Dmitri Gekhtman <[email protected]>
* Revert "Revert "[KubeRay][Autoscaler][Core] Add a flag to disable ray status version check (ray-project#26584)" (ray-project#26597)" (Fixes test issue.) Signed-off-by: Rohan138 <[email protected]>
…check (ray-project#26584) Adds a flag that disables the version check in ray health-check. Signed-off-by: Stefan van der Kleij <[email protected]>
…version check (ray-project#26584)" (ray-project#26597) Reverts ray-project#26584 Seems it breaking test_advanced_4 Signed-off-by: Stefan van der Kleij <[email protected]>
* Revert "Revert "[KubeRay][Autoscaler][Core] Add a flag to disable ray status version check (ray-project#26584)" (ray-project#26597)" (Fixes test issue.) Signed-off-by: Stefan van der Kleij <[email protected]>
Why are these changes needed?
ray health-check
checks if the local Ray version agrees with the GCS Ray version,which is a reasonable thing to do,
unless you're deliberately mismatching Ray versions.
During this development phase of the KubeRay autoscaler integration, we are deliberately mismatching Ray and autoscaler versions. As a result, users have encountered trouble starting up the autoscaler.
Thus, we need a way of disabling the version check.
This PR introduces a flag for that.
This PR also has ray health-check print error output in a couple of spots where it previously did not, for ease of debugging.
Related issue number
Checks
scripts/format.sh
to lint the changes in this PR.I'll manually check compatibility of autoscaler with these changes against Ray 1.11.0