-
Notifications
You must be signed in to change notification settings - Fork 5.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Train] Support full ray.get_gpu_ids()
API in train.torch.get_device()
#28659
Conversation
Signed-off-by: Amog Kamsetty <[email protected]>
ray.get_gpu_ids()
APIray.get_gpu_ids()
API in train.torch.get_device()
Signed-off-by: Amog Kamsetty <[email protected]>
python/ray/train/tests/test_gpu.py
Outdated
if cuda_visible_devices: | ||
assert os.environ["CUDA_VISIBLE_DEVICES"] == cuda_visible_devices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will always return true right? Or is this just to validate that the environment variable is correctly being set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah this is to validate that it’s being set correctly
the actual test is just to make sure it can run without raising an error. The value of get_device should be the same regardless of what the env car is set to as get_device returns the index.
Signed-off-by: Amog Kamsetty <[email protected]>
Signed-off-by: Amog Kamsetty <[email protected]>
Signed-off-by: Amog Kamsetty <[email protected]>
Signed-off-by: Amog Kamsetty <[email protected]>
Signed-off-by: Amog Kamsetty <[email protected]>
Signed-off-by: Amog Kamsetty <[email protected]>
Signed-off-by: Amog Kamsetty <[email protected]>
Signed-off-by: Amog Kamsetty <[email protected]>
Signed-off-by: Amog Kamsetty <[email protected]>
Signed-off-by: Amog Kamsetty <[email protected]>
Signed-off-by: Amog Kamsetty <[email protected]>
Signed-off-by: Amog Kamsetty <[email protected]>
Blocked on #29104 |
Signed-off-by: Amog Kamsetty <[email protected]>
Lint failure is unrelated, going to merge |
…ce()` (#28659) Signed-off-by: Amog Kamsetty [email protected] ray.get_gpu_ids() sometimes returns a list of ints or sometimes returns a list of strings depending on if the user has set the CUDA_VISIBLE_DEVICES environment variable. This has led to the following issue: #28467. It seems like solidifying the Ray Core API will require more discussion (follow the thread here: #28632), so we temporarily account for this in Ray Train for now. Closes #28467
…ce()` (ray-project#28659) Signed-off-by: Amog Kamsetty [email protected] ray.get_gpu_ids() sometimes returns a list of ints or sometimes returns a list of strings depending on if the user has set the CUDA_VISIBLE_DEVICES environment variable. This has led to the following issue: ray-project#28467. It seems like solidifying the Ray Core API will require more discussion (follow the thread here: ray-project#28632), so we temporarily account for this in Ray Train for now. Closes ray-project#28467 Signed-off-by: Weichen Xu <[email protected]>
Signed-off-by: Amog Kamsetty [email protected]
ray.get_gpu_ids()
sometimes returns a list of ints or sometimes returns a list of strings depending on if the user has set theCUDA_VISIBLE_DEVICES
environment variable. This has led to the following issue: #28467.It seems like solidifying the Ray Core API will require more discussion (follow the thread here: #28632), so we temporarily account for this in Ray Train for now.
Closes #28467
Why are these changes needed?
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.