-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[autoscaler][kuberay] Deflake KubeRay autoscaling test #26411
[autoscaler][kuberay] Deflake KubeRay autoscaling test #26411
Conversation
|
||
# This image will be used for both the Ray nodes and the autoscaler. | ||
# The CI should pass an image built from the test branch. | ||
RAY_IMAGE = os.environ.get("RAY_IMAGE", "rayproject/ray:448f52") | ||
RAY_IMAGE = os.environ.get("RAY_IMAGE", "rayproject/ray:nightly-py38") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No particular reason for the py38 besides the fact that I use a py38 environment locally (Ray images are Py37 by default)
Nightly seems a reasonable enough default for a test whose primary purpose is to test PRs going into the master branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(The CI specifies a image built from the PR branch.)
This pr looks good. But why move the code outside the python to pipeline will help? |
I should clarify that -- the pipeline is set up to retry tests. I think that this external retry logic may introduce race conditions around creation and teardown of the operator and CRD. This also makes the test slightly more convenient for me to run repeatedly from my local setup. |
This reverts commit e5a70bb.
looks like multiple test failure should be fixed. |
Not sure how these changes could have triggered the failures, but that's quite a few failures. |
It's looking better after rebasing. |
…6411) Improves stability of KubeRay autoscaling test. Signed-off-by: Stefan van der Kleij <[email protected]>
Why are these changes needed?
Deflakes the KubeRay autoscaling e2e test, which appears to be suffering from a race condition involving CRD creation and registration.
Background: registering a CRD takes a bit of time after the request to create the CRD object returns.
Strategy:
Related issue number
Closes #26377
Checks
I ran the test 12 times in the CI with these changes and didn't observe any failures.