Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Train] Support full ray.get_gpu_ids() API in train.torch.get_device() #28659

Merged
merged 18 commits into from
Oct 6, 2022

Conversation

amogkam
Copy link
Contributor

@amogkam amogkam commented Sep 21, 2022

Signed-off-by: Amog Kamsetty [email protected]

ray.get_gpu_ids() sometimes returns a list of ints or sometimes returns a list of strings depending on if the user has set the CUDA_VISIBLE_DEVICES environment variable. This has led to the following issue: #28467.

It seems like solidifying the Ray Core API will require more discussion (follow the thread here: #28632), so we temporarily account for this in Ray Train for now.

Closes #28467

Why are these changes needed?

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Amog Kamsetty <[email protected]>
@amogkam amogkam changed the title [Train] Support full ray.get_gpu_ids() API [Train] Support full ray.get_gpu_ids() API in train.torch.get_device() Sep 21, 2022
Signed-off-by: Amog Kamsetty <[email protected]>
Comment on lines 100 to 101
if cuda_visible_devices:
assert os.environ["CUDA_VISIBLE_DEVICES"] == cuda_visible_devices
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will always return true right? Or is this just to validate that the environment variable is correctly being set?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is to validate that it’s being set correctly

the actual test is just to make sure it can run without raising an error. The value of get_device should be the same regardless of what the env car is set to as get_device returns the index.

Signed-off-by: Amog Kamsetty <[email protected]>
Signed-off-by: Amog Kamsetty <[email protected]>
Signed-off-by: Amog Kamsetty <[email protected]>
Signed-off-by: Amog Kamsetty <[email protected]>
Signed-off-by: Amog Kamsetty <[email protected]>
Signed-off-by: Amog Kamsetty <[email protected]>
Signed-off-by: Amog Kamsetty <[email protected]>
Signed-off-by: Amog Kamsetty <[email protected]>
Signed-off-by: Amog Kamsetty <[email protected]>
Signed-off-by: Amog Kamsetty <[email protected]>
Signed-off-by: Amog Kamsetty <[email protected]>
Signed-off-by: Amog Kamsetty <[email protected]>
@amogkam
Copy link
Contributor Author

amogkam commented Oct 6, 2022

Blocked on #29104

@amogkam
Copy link
Contributor Author

amogkam commented Oct 6, 2022

Lint failure is unrelated, going to merge

@amogkam amogkam merged commit 4732753 into ray-project:master Oct 6, 2022
@amogkam amogkam deleted the fix-train-gpu-id branch October 6, 2022 19:38
maxpumperla pushed a commit that referenced this pull request Oct 7, 2022
…ce()` (#28659)

Signed-off-by: Amog Kamsetty [email protected]

ray.get_gpu_ids() sometimes returns a list of ints or sometimes returns a list of strings depending on if the user has set the CUDA_VISIBLE_DEVICES environment variable. This has led to the following issue: #28467.

It seems like solidifying the Ray Core API will require more discussion (follow the thread here: #28632), so we temporarily account for this in Ray Train for now.

Closes #28467
WeichenXu123 pushed a commit to WeichenXu123/ray that referenced this pull request Dec 19, 2022
…ce()` (ray-project#28659)

Signed-off-by: Amog Kamsetty [email protected]

ray.get_gpu_ids() sometimes returns a list of ints or sometimes returns a list of strings depending on if the user has set the CUDA_VISIBLE_DEVICES environment variable. This has led to the following issue: ray-project#28467.

It seems like solidifying the Ray Core API will require more discussion (follow the thread here: ray-project#28632), so we temporarily account for this in Ray Train for now.

Closes ray-project#28467

Signed-off-by: Weichen Xu <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants