Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RELAND] Add xpu to getAccelerator #129205

Closed
wants to merge 16 commits into from

Conversation

Copy link

pytorch-bot bot commented Jun 21, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129205

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 2 New Failures, 1 Cancelled Job

As of commit f17b3da with merge base e2e624a (image):

NEW FAILURES - The following jobs have failed:

CANCELLED JOB - The following job was cancelled. Please retry:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

guangyey added a commit that referenced this pull request Jun 21, 2024
ghstack-source-id: 32321324d337f1d57b1e46695f5c971d1f74c8bf
Pull Request resolved: #129205
@guangyey guangyey added intel This tag is for PR from Intel module: xpu Intel XPU related issues ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks release notes: xpu release notes category labels Jun 21, 2024
@gujinghui gujinghui requested a review from EikanWang June 21, 2024 06:15
[ghstack-poisoned]
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanup, thanks!

if (at::has##device_name()) { \
device_type = k##device_name; \
TORCH_CHECK( \
!is_mutex_device_detected, \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean "mutex" or "multiple" here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meat mutex, but multiple is better. I will update to multiple.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I go through the code and think that mutex is better. It means that if the mutually exclusive device is detected. We assign is_mutex_device_detected to True when the first accelerator is detected. And check it at the sequent accelerator detection to ensure the mutual exclusiveness in PyTorch.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "mutex" is a bit overloaded in the c++ context and I would avoid using it unless you mean the std::mutex class.
But this is a small details, feel free to ignore if you prefer this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What you said makes sense. To avoid misleading in the name, I change ASSIGN_ACCELERATOR_AND_CHECK_MUTEX to DETECT_AND_ASSIGN_ACCELERATOR and is_mutex_device_detected to is_accelerator_detected.

guangyey added a commit that referenced this pull request Jun 24, 2024
ghstack-source-id: 992194c12a3d133a083db79ae588d0c76ed2a9a6
Pull Request resolved: #129205
# Motivation
Add `xpu` support to `getAccelerator`.


cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
# Motivation
Add `xpu` support to `getAccelerator`.


cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
# Motivation
Add `xpu` support to `getAccelerator`.


cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
# Motivation
Add `xpu` support to `getAccelerator`.


cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
# Motivation
Add `xpu` support to `getAccelerator`.


cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
# Motivation
Add `xpu` support to `getAccelerator`.


cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
# Motivation
Add `xpu` support to `getAccelerator`.


cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
# Motivation
Add `xpu` support to `getAccelerator`.


cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
# Motivation
Add `xpu` support to `getAccelerator`.


cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
# Motivation
Add `xpu` support to `getAccelerator`.


cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
# Motivation
Add `xpu` support to `getAccelerator`.


cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
# Motivation
Add `xpu` support to `getAccelerator`.


cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
OnlyFor pushed a commit to OnlyFor/pytorch that referenced this pull request Jul 2, 2024
ghstack-source-id: 71d0dc3fb56dc10c7d7f02dbb732a17b5d6c6237
Pull Request resolved: pytorch#129205
pytorchmergebot pushed a commit to khushi-411/pytorch that referenced this pull request Jul 2, 2024
# Motivation
Add `xpu` support to `getAccelerator`.

Pull Request resolved: pytorch#129205
Approved by: https://github.com/albanD, https://github.com/gujinghui
ghstack dependencies: pytorch#129463
@kit1980
Copy link
Member

kit1980 commented Jul 3, 2024

@pytorchbot revert -m "Need to revert #129463 which breaks Meta builds" -c ghfirst

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request Jul 3, 2024
This reverts commit 3e2df3c.

Reverted #129205 on behalf of https://github.com/kit1980 due to Need to revert #129463 which breaks Meta builds ([comment](#129205 (comment)))
@pytorchmergebot
Copy link
Collaborator

@guangyey your PR has been successfully reverted.

@guangyey guangyey changed the title Add xpu to getAccelerator [RELAND] Add xpu to getAccelerator Jul 4, 2024
# Motivation
Add `xpu` support to `getAccelerator`.


cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
@guangyey
Copy link
Collaborator Author

guangyey commented Jul 4, 2024

Unrelated failures.
@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 2 checks: xpu / linux-jammy-xpu-py3.8 / test (default, 1, 4, linux.idc.xpu), trunk / linux-focal-rocm6.1-py3.8 / test (default, 2, 2, linux.rocm.gpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@guangyey
Copy link
Collaborator Author

guangyey commented Jul 4, 2024

@pytorchbot merge -f 'unreleated failures, trunk / macos-py3-arm64 / build (push) CI is hanging that code change is unrelated to mps, so ignore it.'

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

# Motivation
Add `xpu` support to `getAccelerator`.


cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14

[ghstack-poisoned]
pytorchmergebot pushed a commit that referenced this pull request Jul 15, 2024
…en (#129119)

# Motivation
Before this PR, device construction was `cuda` type when only a device index was given. It also returns the `PrivateUser1` type if a `PrivateUser1` type is registered.
```bash
>>> import torch
>>> device = torch.device(0)
>>> device.type
'cuda'
>>> a = torch.tensor([1, 2])
>>> b = a.to(0)
>>> b
tensor([1, 2], device='cuda:0')
```
It works well on CUDA GPU. But it will raise unexpected information and error running on XPU.
```bash
>>> import torch
>>> device = torch.device(0)
>>> device.type
'cuda'
>>> a = torch.tensor([1, 2])
>>> b = a.to(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/xxx/pytorch/torch/cuda/__init__.py", line 302, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
```
With this PR, refine the logic to use the currently available device type instead.
Pull Request resolved: #129119
Approved by: https://github.com/albanD, https://github.com/gujinghui, https://github.com/EikanWang
ghstack dependencies: #129463, #129205, #129363
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Jul 25, 2024
…en (pytorch#129119)

# Motivation
Before this PR, device construction was `cuda` type when only a device index was given. It also returns the `PrivateUser1` type if a `PrivateUser1` type is registered.
```bash
>>> import torch
>>> device = torch.device(0)
>>> device.type
'cuda'
>>> a = torch.tensor([1, 2])
>>> b = a.to(0)
>>> b
tensor([1, 2], device='cuda:0')
```
It works well on CUDA GPU. But it will raise unexpected information and error running on XPU.
```bash
>>> import torch
>>> device = torch.device(0)
>>> device.type
'cuda'
>>> a = torch.tensor([1, 2])
>>> b = a.to(0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/xxx/pytorch/torch/cuda/__init__.py", line 302, in _lazy_init
    raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled
```
With this PR, refine the logic to use the currently available device type instead.
Pull Request resolved: pytorch#129119
Approved by: https://github.com/albanD, https://github.com/gujinghui, https://github.com/EikanWang
ghstack dependencies: pytorch#129463, pytorch#129205, pytorch#129363
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks intel This tag is for PR from Intel Merged module: xpu Intel XPU related issues open source release notes: xpu release notes category Reverted
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

6 participants