[RELAND] Add xpu to getAccelerator #129205

guangyey · 2024-06-21T05:05:51Z

Stack from ghstack (oldest at bottom):

Motivation

Add xpu support to getAccelerator.

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @gujinghui @EikanWang @fengyuan14

pytorch-bot · 2024-06-21T05:05:54Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129205

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

Checkout action fails due to incompatible GLIBC

❌ 2 New Failures, 1 Cancelled Job

As of commit f17b3da with merge base e2e624a ():

NEW FAILURES - The following jobs have failed:

trunk / linux-focal-rocm6.1-py3.8 / test (default, 2, 2, linux.rocm.gpu) (gh)
inductor/test_torchinductor.py::SweepInputsCpuTest::test_cpu_broadcast1_transposed
xpu / linux-jammy-xpu-py3.8 / test (default, 1, 4, linux.idc.xpu) (gh)
inductor/test_minifier.py::MinifierTests::test_accuracy_vs_strict_accuracy

CANCELLED JOB - The following job was cancelled. Please retry:

trunk / macos-py3-arm64 / build (gh)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 32321324d337f1d57b1e46695f5c971d1f74c8bf Pull Request resolved: #129205

[ghstack-poisoned]

albanD

Nice cleanup, thanks!

albanD · 2024-06-21T13:45:51Z

aten/src/ATen/DeviceAccelerator.cpp

+ if (at::has##device_name()) { \
+ device_type = k##device_name; \
+ TORCH_CHECK( \
+ !is_mutex_device_detected, \


Do you mean "mutex" or "multiple" here?

I meat mutex, but multiple is better. I will update to multiple.

I go through the code and think that mutex is better. It means that if the mutually exclusive device is detected. We assign is_mutex_device_detected to True when the first accelerator is detected. And check it at the sequent accelerator detection to ensure the mutual exclusiveness in PyTorch.

I think "mutex" is a bit overloaded in the c++ context and I would avoid using it unless you mean the std::mutex class.
But this is a small details, feel free to ignore if you prefer this.

What you said makes sense. To avoid misleading in the name, I change ASSIGN_ACCELERATOR_AND_CHECK_MUTEX to DETECT_AND_ASSIGN_ACCELERATOR and is_mutex_device_detected to is_accelerator_detected.

ghstack-source-id: 992194c12a3d133a083db79ae588d0c76ed2a9a6 Pull Request resolved: #129205

# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]

ghstack-source-id: 71d0dc3fb56dc10c7d7f02dbb732a17b5d6c6237 Pull Request resolved: pytorch#129205

# Motivation Add `xpu` support to `getAccelerator`. Pull Request resolved: pytorch#129205 Approved by: https://github.com/albanD, https://github.com/gujinghui ghstack dependencies: pytorch#129463

kit1980 · 2024-07-03T23:35:38Z

@pytorchbot revert -m "Need to revert #129463 which breaks Meta builds" -c ghfirst

pytorchmergebot · 2024-07-03T23:37:15Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

This reverts commit 3e2df3c. Reverted #129205 on behalf of https://github.com/kit1980 due to Need to revert #129463 which breaks Meta builds ([comment](#129205 (comment)))

pytorchmergebot · 2024-07-03T23:37:27Z

@guangyey your PR has been successfully reverted.

# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]

guangyey · 2024-07-04T10:11:29Z

Unrelated failures.
@pytorchbot merge -i

pytorchmergebot · 2024-07-04T10:13:14Z

Merge started

Your change will be merged while ignoring the following 2 checks: xpu / linux-jammy-xpu-py3.8 / test (default, 1, 4, linux.idc.xpu), trunk / linux-focal-rocm6.1-py3.8 / test (default, 2, 2, linux.rocm.gpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

guangyey · 2024-07-04T10:24:58Z

@pytorchbot merge -f 'unreleated failures, trunk / macos-py3-arm64 / build (push) CI is hanging that code change is unrelated to mps, so ignore it.'

pytorchmergebot · 2024-07-04T10:25:17Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

pytorchmergebot · 2024-07-04T10:26:43Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]

Pull Request resolved: #129363 Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD ghstack dependencies: #129463, #129205

…en (#129119) # Motivation Before this PR, device construction was `cuda` type when only a device index was given. It also returns the `PrivateUser1` type if a `PrivateUser1` type is registered. ```bash >>> import torch >>> device = torch.device(0) >>> device.type 'cuda' >>> a = torch.tensor([1, 2]) >>> b = a.to(0) >>> b tensor([1, 2], device='cuda:0') ``` It works well on CUDA GPU. But it will raise unexpected information and error running on XPU. ```bash >>> import torch >>> device = torch.device(0) >>> device.type 'cuda' >>> a = torch.tensor([1, 2]) >>> b = a.to(0) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/xxx/pytorch/torch/cuda/__init__.py", line 302, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled ``` With this PR, refine the logic to use the currently available device type instead. Pull Request resolved: #129119 Approved by: https://github.com/albanD, https://github.com/gujinghui, https://github.com/EikanWang ghstack dependencies: #129463, #129205, #129363

Pull Request resolved: pytorch#129363 Approved by: https://github.com/EikanWang, https://github.com/gujinghui, https://github.com/albanD ghstack dependencies: pytorch#129463, pytorch#129205

…en (pytorch#129119) # Motivation Before this PR, device construction was `cuda` type when only a device index was given. It also returns the `PrivateUser1` type if a `PrivateUser1` type is registered. ```bash >>> import torch >>> device = torch.device(0) >>> device.type 'cuda' >>> a = torch.tensor([1, 2]) >>> b = a.to(0) >>> b tensor([1, 2], device='cuda:0') ``` It works well on CUDA GPU. But it will raise unexpected information and error running on XPU. ```bash >>> import torch >>> device = torch.device(0) >>> device.type 'cuda' >>> a = torch.tensor([1, 2]) >>> b = a.to(0) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/xxx/pytorch/torch/cuda/__init__.py", line 302, in _lazy_init raise AssertionError("Torch not compiled with CUDA enabled") AssertionError: Torch not compiled with CUDA enabled ``` With this PR, refine the logic to use the currently available device type instead. Pull Request resolved: pytorch#129119 Approved by: https://github.com/albanD, https://github.com/gujinghui, https://github.com/EikanWang ghstack dependencies: pytorch#129463, pytorch#129205, pytorch#129363

guangyey mentioned this pull request Jun 21, 2024

Refine the logic of device construction when only device index is given #129119

Closed

guangyey added a commit that referenced this pull request Jun 21, 2024

Add xpu to getAccelerator

f53231d

ghstack-source-id: 32321324d337f1d57b1e46695f5c971d1f74c8bf Pull Request resolved: #129205

guangyey requested review from gujinghui and albanD June 21, 2024 05:07

pytorchbot added the open source label Jun 21, 2024

guangyey added intel This tag is for PR from Intel module: xpu Intel XPU related issues ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks release notes: xpu release notes category labels Jun 21, 2024

gujinghui requested a review from EikanWang June 21, 2024 06:15

Add xpu to getAccelerator

a6a8d59

[ghstack-poisoned]

albanD approved these changes Jun 21, 2024

View reviewed changes

guangyey mentioned this pull request Jun 24, 2024

Introduce the concept of Accelerators to PyTorch doc #129363

Closed

guangyey added a commit that referenced this pull request Jun 24, 2024

Add xpu to getAccelerator

967bc0f

ghstack-source-id: 992194c12a3d133a083db79ae588d0c76ed2a9a6 Pull Request resolved: #129205

Update on "Add xpu to getAccelerator"

f62099e

# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]

This was referenced Jun 25, 2024

Use default accelerator type as the default parameter for pin_memory #129460

Closed

[RELAND] XPUHooksInterface inherits from AcceleratorHooksInterface #129463

Closed

guangyey added 9 commits June 25, 2024 11:49

Update on "Add xpu to getAccelerator"

61c5c86

# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]

Update on "Add xpu to getAccelerator"

7d96d3f

# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]

Update on "Add xpu to getAccelerator"

2ecb7de

# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]

Update on "Add xpu to getAccelerator"

4c7883b

# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]

Update on "Add xpu to getAccelerator"

f07c06b

# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]

Update on "Add xpu to getAccelerator"

b951626

# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]

Update on "Add xpu to getAccelerator"

20e2137

# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]

Update on "Add xpu to getAccelerator"

37d8982

# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]

Update on "Add xpu to getAccelerator"

0ff5121

# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]

pytorchmergebot closed this in 3e2df3c Jul 2, 2024

pytorchmergebot removed the merging label Jul 2, 2024

guangyey mentioned this pull request Jul 2, 2024

Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept #126376

Closed

guangyey added 2 commits July 2, 2024 09:21

Update on "Add xpu to getAccelerator"

b144dc7

# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]

Update on "Add xpu to getAccelerator"

3d2c2ea

# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]

OnlyFor pushed a commit to OnlyFor/pytorch that referenced this pull request Jul 2, 2024

Add xpu to getAccelerator

51d04b0

ghstack-source-id: 71d0dc3fb56dc10c7d7f02dbb732a17b5d6c6237 Pull Request resolved: pytorch#129205

pytorchmergebot added the Reverted label Jul 3, 2024

pytorchmergebot reopened this Jul 3, 2024

guangyey changed the title ~~Add xpu to getAccelerator~~ [RELAND] Add xpu to getAccelerator Jul 4, 2024

Update on "Add xpu to getAccelerator"

94e01f2

# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]

pytorchmergebot added the merging label Jul 4, 2024

pytorchmergebot closed this in 57d05f2 Jul 4, 2024

pytorchmergebot removed the merging label Jul 4, 2024

Update on "[RELAND] Add xpu to getAccelerator"

f17b3da

# Motivation Add `xpu` support to `getAccelerator`. cc jgong5 mingfeima XiaobingSuper sanchitintel ashokei jingxu10 gujinghui EikanWang fengyuan14 [ghstack-poisoned]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RELAND] Add xpu to getAccelerator #129205

[RELAND] Add xpu to getAccelerator #129205

guangyey commented Jun 21, 2024 •

edited by pytorchmergebot

Loading

pytorch-bot bot commented Jun 21, 2024 •

edited

Loading

albanD left a comment

albanD Jun 21, 2024

guangyey Jun 24, 2024

guangyey Jun 25, 2024

albanD Jun 25, 2024

guangyey Jun 26, 2024

kit1980 commented Jul 3, 2024

pytorchmergebot commented Jul 3, 2024

pytorchmergebot commented Jul 3, 2024

guangyey commented Jul 4, 2024

pytorchmergebot commented Jul 4, 2024

guangyey commented Jul 4, 2024

pytorchmergebot commented Jul 4, 2024

pytorchmergebot commented Jul 4, 2024

[RELAND] Add xpu to getAccelerator #129205

[RELAND] Add xpu to getAccelerator #129205

Conversation

guangyey commented Jun 21, 2024 • edited by pytorchmergebot Loading

Motivation

pytorch-bot bot commented Jun 21, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129205

❗ 1 Active SEVs

❌ 2 New Failures, 1 Cancelled Job

albanD left a comment

Choose a reason for hiding this comment

albanD Jun 21, 2024

Choose a reason for hiding this comment

guangyey Jun 24, 2024

Choose a reason for hiding this comment

guangyey Jun 25, 2024

Choose a reason for hiding this comment

albanD Jun 25, 2024

Choose a reason for hiding this comment

guangyey Jun 26, 2024

Choose a reason for hiding this comment

kit1980 commented Jul 3, 2024

pytorchmergebot commented Jul 3, 2024

pytorchmergebot commented Jul 3, 2024

guangyey commented Jul 4, 2024

pytorchmergebot commented Jul 4, 2024

Merge started

guangyey commented Jul 4, 2024

pytorchmergebot commented Jul 4, 2024

pytorchmergebot commented Jul 4, 2024

Merge started

guangyey commented Jun 21, 2024 •

edited by pytorchmergebot

Loading

pytorch-bot bot commented Jun 21, 2024 •

edited

Loading