Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[placement groups] fix gpu ids for bundles #14574

Merged
merged 2 commits into from
Mar 9, 2021

Conversation

richardliaw
Copy link
Contributor

@richardliaw richardliaw commented Mar 9, 2021

Why are these changes needed?

Closes #14542

Related issue number

Checks

  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Richard Liaw <[email protected]>
@richardliaw richardliaw merged commit ea7d4c6 into ray-project:master Mar 9, 2021
@richardliaw richardliaw deleted the fix-gpu-ids branch March 9, 2021 23:12
@@ -424,12 +424,14 @@ def get_gpu_ids():

# TODO(ilr) Handle inserting resources in local mode
all_resource_ids = global_worker.core_worker.resource_ids()
assigned_ids = []
assigned_ids = set()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@richardliaw, kind of an edge case, but the order of the devices in CUDA_VISIBLE_DEVICES actually does have some implications. Namely, since CUDA assigns IDs based on the fastest device first (see CUDA_DEVICE_ORDER), you usually want to use device 0 first when available.

One alternative might be to use an order-preserving dedup instead: https://stackoverflow.com/questions/480214/how-do-you-remove-duplicates-from-a-list-whilst-preserving-order/39835527#39835527

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, what if we just sort the returned list at 434? I think Ray doesn't have any ordering guarantees from the detected CUDA devices, though maybe we can support that later

Copy link
Contributor

@tgaddair tgaddair Mar 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that works! May want to double check if the ids here are strings or ints for that, though (would want to sort based on numerical value for strings).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

CUDA_VISIBLE_DEVICES is not set properly when using placement groups with GPUs
3 participants