-
Notifications
You must be signed in to change notification settings - Fork 21.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RELAND] Add xpu to getAccelerator #129205
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129205
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ❌ 2 New Failures, 1 Cancelled JobAs of commit f17b3da with merge base e2e624a (): NEW FAILURES - The following jobs have failed:
CANCELLED JOB - The following job was cancelled. Please retry:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ghstack-source-id: 32321324d337f1d57b1e46695f5c971d1f74c8bf Pull Request resolved: #129205
[ghstack-poisoned]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice cleanup, thanks!
aten/src/ATen/DeviceAccelerator.cpp
Outdated
if (at::has##device_name()) { \ | ||
device_type = k##device_name; \ | ||
TORCH_CHECK( \ | ||
!is_mutex_device_detected, \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean "mutex" or "multiple" here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meat mutex
, but multiple
is better. I will update to multiple
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I go through the code and think that mutex
is better. It means that if the mutually exclusive device is detected. We assign is_mutex_device_detected
to True
when the first accelerator is detected. And check it at the sequent accelerator detection to ensure the mutual exclusiveness in PyTorch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think "mutex" is a bit overloaded in the c++ context and I would avoid using it unless you mean the std::mutex class.
But this is a small details, feel free to ignore if you prefer this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you said makes sense. To avoid misleading in the name, I change ASSIGN_ACCELERATOR_AND_CHECK_MUTEX
to DETECT_AND_ASSIGN_ACCELERATOR
and is_mutex_device_detected
to is_accelerator_detected
.
ghstack-source-id: 992194c12a3d133a083db79ae588d0c76ed2a9a6 Pull Request resolved: #129205
# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]
# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]
# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]
# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]
# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]
# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]
# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]
# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]
# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]
# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]
# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]
# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]
ghstack-source-id: 71d0dc3fb56dc10c7d7f02dbb732a17b5d6c6237 Pull Request resolved: pytorch#129205
# Motivation Add `xpu` support to `getAccelerator`. Pull Request resolved: pytorch#129205 Approved by: https://github.com/albanD, https://github.com/gujinghui ghstack dependencies: pytorch#129463
@pytorchbot revert -m "Need to revert #129463 which breaks Meta builds" -c ghfirst |
@pytorchbot successfully started a revert job. Check the current status here. |
This reverts commit 3e2df3c. Reverted #129205 on behalf of https://github.com/kit1980 due to Need to revert #129463 which breaks Meta builds ([comment](#129205 (comment)))
@guangyey your PR has been successfully reverted. |
# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]
Unrelated failures. |
Merge startedYour change will be merged while ignoring the following 2 checks: xpu / linux-jammy-xpu-py3.8 / test (default, 1, 4, linux.idc.xpu), trunk / linux-focal-rocm6.1-py3.8 / test (default, 2, 2, linux.rocm.gpu) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
@pytorchbot merge -f 'unreleated failures, trunk / macos-py3-arm64 / build (push) CI is hanging that code change is unrelated to mps, so ignore it.' |
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]
Pull Request resolved: #129363 Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD ghstack dependencies: #129463, #129205
…en (#129119) # Motivation Before this PR, device construction was `cuda` type when only a device index was given. It also returns the `PrivateUser1` type if a `PrivateUser1` type is registered. ```bash >>> import torch >>> device = torch.device(0) >>> device.type 'cuda' >>> a = torch.tensor([1, 2]) >>> b = a.to(0) >>> b tensor([1, 2], device='cuda:0') ``` It works well on CUDA GPU. But it will raise unexpected information and error running on XPU. ```bash >>> import torch >>> device = torch.device(0) >>> device.type 'cuda' >>> a = torch.tensor([1, 2]) >>> b = a.to(0) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/xxx/pytorch/torch/cuda/__init__.py", line 302, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled ``` With this PR, refine the logic to use the currently available device type instead. Pull Request resolved: #129119 Approved by: https://github.com/albanD, https://github.com/gujinghui, https://github.com/EikanWang ghstack dependencies: #129463, #129205, #129363
Pull Request resolved: pytorch#129363 Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD ghstack dependencies: pytorch#129463, pytorch#129205
…en (pytorch#129119) # Motivation Before this PR, device construction was `cuda` type when only a device index was given. It also returns the `PrivateUser1` type if a `PrivateUser1` type is registered. ```bash >>> import torch >>> device = torch.device(0) >>> device.type 'cuda' >>> a = torch.tensor([1, 2]) >>> b = a.to(0) >>> b tensor([1, 2], device='cuda:0') ``` It works well on CUDA GPU. But it will raise unexpected information and error running on XPU. ```bash >>> import torch >>> device = torch.device(0) >>> device.type 'cuda' >>> a = torch.tensor([1, 2]) >>> b = a.to(0) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/xxx/pytorch/torch/cuda/__init__.py", line 302, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled ``` With this PR, refine the logic to use the currently available device type instead. Pull Request resolved: pytorch#129119 Approved by: https://github.com/albanD, https://github.com/gujinghui, https://github.com/EikanWang ghstack dependencies: pytorch#129463, pytorch#129205, pytorch#129363
Stack from ghstack (oldest at bottom):
Motivation
Add
xpu
support togetAccelerator
.cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @gujinghui @EikanWang @fengyuan14