-
Notifications
You must be signed in to change notification settings - Fork 5.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Release Test] Update cuda version in gpu docker cluster launcher image to 12.1 #42246
Merged
architkulkarni
merged 2 commits into
ray-project:master
from
architkulkarni:fix-gpu-docker-release-test
Jan 10, 2024
Merged
[Release Test] Update cuda version in gpu docker cluster launcher image to 12.1 #42246
architkulkarni
merged 2 commits into
ray-project:master
from
architkulkarni:fix-gpu-docker-release-test
Jan 10, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Signed-off-by: Archit Kulkarni <[email protected]>
Signed-off-by: Archit Kulkarni <[email protected]>
architkulkarni
requested review from
ericl,
hongchaodeng and
a team
as code owners
January 9, 2024 00:27
Release test running here: https://buildkite.com/ray-project/release/builds/5418 Assigning @stephanie-wang as core-oncall (codeowner) as Hongchao is out. |
rickyyx
approved these changes
Jan 10, 2024
architkulkarni
added a commit
to architkulkarni/ray
that referenced
this pull request
Jan 10, 2024
…ge to 12.1 (ray-project#42246) After the Ray 2.9 release, the release test for the GPU Docker example cluster YAML file started failing with 2023-12-23 03:00:43,078 VINFO command_runner.py:371 -- Running `docker run --rm --name ray_nvidia_docker -d -it -e LC_ALL=C.UTF-8 -e LANG=C.UTF-8 --shm-size='2301055426.56b' --runtime=nvidia --net=host rayproject/ray:latest-gpu bash` 24897079968c098daccf1ed65a0bea5d3d9e3df84de201ea20f1a34b0363975c docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.8, please update your driver to a newer version, or use an earlier cuda container: unknown. The likely cause is Ray 2.9 increased the required CUDA version to 11.8. This PR updates the CUDA version used in the GCP VM image in the example cluster YAML file from 11.3 to 12.1. The test passes after this change. Related issue number Closes ray-project#42134 --------- Signed-off-by: Archit Kulkarni <[email protected]>
architkulkarni
added a commit
to architkulkarni/ray
that referenced
this pull request
Jan 10, 2024
…ge to 12.1 (ray-project#42246) After the Ray 2.9 release, the release test for the GPU Docker example cluster YAML file started failing with 2023-12-23 03:00:43,078 VINFO command_runner.py:371 -- Running `docker run --rm --name ray_nvidia_docker -d -it -e LC_ALL=C.UTF-8 -e LANG=C.UTF-8 --shm-size='2301055426.56b' --runtime=nvidia --net=host rayproject/ray:latest-gpu bash` 24897079968c098daccf1ed65a0bea5d3d9e3df84de201ea20f1a34b0363975c docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.8, please update your driver to a newer version, or use an earlier cuda container: unknown. The likely cause is Ray 2.9 increased the required CUDA version to 11.8. This PR updates the CUDA version used in the GCP VM image in the example cluster YAML file from 11.3 to 12.1. The test passes after this change. Related issue number Closes ray-project#42134 --------- Signed-off-by: Archit Kulkarni <[email protected]>
Closed
8 tasks
vickytsang
pushed a commit
to ROCm/ray
that referenced
this pull request
Jan 12, 2024
…ge to 12.1 (ray-project#42246) After the Ray 2.9 release, the release test for the GPU Docker example cluster YAML file started failing with 2023-12-23 03:00:43,078 VINFO command_runner.py:371 -- Running `docker run --rm --name ray_nvidia_docker -d -it -e LC_ALL=C.UTF-8 -e LANG=C.UTF-8 --shm-size='2301055426.56b' --runtime=nvidia --net=host rayproject/ray:latest-gpu bash` 24897079968c098daccf1ed65a0bea5d3d9e3df84de201ea20f1a34b0363975c docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.8, please update your driver to a newer version, or use an earlier cuda container: unknown. The likely cause is Ray 2.9 increased the required CUDA version to 11.8. This PR updates the CUDA version used in the GCP VM image in the example cluster YAML file from 11.3 to 12.1. The test passes after this change. Related issue number Closes ray-project#42134 --------- Signed-off-by: Archit Kulkarni <[email protected]>
raulchen
pushed a commit
to raulchen/ray
that referenced
this pull request
Jan 19, 2024
…ge to 12.1 (ray-project#42246) After the Ray 2.9 release, the release test for the GPU Docker example cluster YAML file started failing with 2023-12-23 03:00:43,078 VINFO command_runner.py:371 -- Running `docker run --rm --name ray_nvidia_docker -d -it -e LC_ALL=C.UTF-8 -e LANG=C.UTF-8 --shm-size='2301055426.56b' --runtime=nvidia --net=host rayproject/ray:latest-gpu bash` 24897079968c098daccf1ed65a0bea5d3d9e3df84de201ea20f1a34b0363975c docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: requirement error: unsatisfied condition: cuda>=11.8, please update your driver to a newer version, or use an earlier cuda container: unknown. The likely cause is Ray 2.9 increased the required CUDA version to 11.8. This PR updates the CUDA version used in the GCP VM image in the example cluster YAML file from 11.3 to 12.1. The test passes after this change. Related issue number Closes ray-project#42134 --------- Signed-off-by: Archit Kulkarni <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
After the Ray 2.9 release, the release test for the GPU Docker example cluster YAML file started failing with
The likely cause is Ray 2.9 increased the required CUDA version to 11.8. This PR updates the CUDA version used in the GCP VM image in the example cluster YAML file from 11.3 to 12.1. The test passes after this change.
Related issue number
Closes #42134
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.