[Intel GPU] xpu-ops codegen via backend whitelist #127865

ZhiweiYan-96 · 2024-06-04T07:02:14Z

Motivation

This PR intends to enhance the gen.py to allow generate codes for a specific backend.

XPU operators need be registered in an hand-written way currently. Developers have no chance to take the advantage of shared code to handle tensor meta setting (like strides, proxy output, structured kernels). Manually porting code is erro-prone and may lead to high maintaining efforts.

We intend to enhance the bckend_whitelist argument in gen.py to generate independent backend specific files. Then backend developers could directly use such generated code

Usage

The specialty of XPU ops is that they lie in third_pary/torch-xpu-ops, so it codegen resides in a separate process.

We use the following commands to trigger XPU codegen

python -m torchgen.gen --source-path path/to/yaml/of/xpu --install-dir build/aten/src/ATen/xpu --per-operator-headers --static-dispatch-backend --backend-whitelist=XPU

The diff lies at backend-whitelist=XPU. The backend-whitelist key is an existent argument in torchgen. But we observe that it can be hardly used for generating source files like Registexxx.cpp. We enhance the backend-white list argument in gen.py, where we could use it to generate backend only code. Like build/aten/src/ATen/xpu/RegisterXPU.cpp, build/aten/src/ATen/xpu/ops/as_strided_native.h

The input of gen.py are code templates and operators yaml. We share the same templates in aten. A simplified yaml lies in third_party/torch-xpu-ops, which only includes the supported xpu operators. This yaml is a copy-and-modify of native_functions.yaml. No extra entry is added, the format is same as the one in aten

Result

All operators headers are generated in build/aten/src/ATen/xpu/ops independently, which would not affect operators declared/defined by CPU/CUDA or any other backend. XPU operators only include headers in this folder.

Verification

We add a ut for verifying xpu codgen in test_codegen.py and it could be triggered through
python -m tools.test.test_codegen -k BackendWhitelist
In third-party/torch-xpu-ops, we migrate some operators to structured kernels style, where they are registered through REGISTER_XPU_DISPATCH or TORCH_IMPL_FUNC. And it is tested with test_xpu.py

Others

As for the longer term, we would unify code generation system with other backend also, like cuda, mps.

Stack from ghstack (oldest at bottom):

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @awgu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @chauhang @d4l3k @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @mcarilli @ptrblck @leslie-fang-intel @EikanWang @voznesenskym @penguinwu @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @LucasLLC @MeetVadakkanchery @mhorowitz

[ghstack-poisoned]

pytorch-bot · 2024-06-04T07:02:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/127865

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 39 Cancelled Jobs, 13 Unrelated Failures

As of commit b31d00f with merge base 3a18577 ():

CANCELLED JOBS - The following jobs were cancelled. Please retry:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

Mac MPS / macos-py3-arm64 / build (gh) (similar failure)
##[error]The operation was canceled.
trunk / before-test / llm-retrieval (gh) (matched llm-retrieval rule in flaky-rules.json)
##[error]The operation was canceled.
trunk / libtorch-linux-focal-cuda12.1-py3.7-gcc9-debug / build (gh) (similar failure)
##[error]The operation was canceled.
trunk / libtorch-linux-focal-cuda12.4-py3.7-gcc9-debug / build (gh) (similar failure)
##[error]The operation was canceled.
trunk / linux-focal-cuda11.8-py3.10-gcc9-experimental-split-build / build (gh) (similar failure)
##[error]The operation was canceled.
trunk / linux-focal-cuda12.1-py3.10-gcc9-no-ops / build (gh) (similar failure)
##[error]The operation was canceled.
trunk / linux-focal-cuda12.4-py3.10-gcc9-experimental-split-build / build (gh) (similar failure)
##[error]The operation was canceled.
trunk / linux-focal-cuda12.4-py3.10-gcc9-no-ops / build (gh) (similar failure)
##[error]The operation was canceled.
trunk / linux-focal-cuda12.4-py3.10-gcc9-sm86 / build (gh) (similar failure)
##[error]The operation was canceled.
trunk / macos-py3-arm64 / build (gh) (similar failure)
##[error]The operation was canceled.
trunk / win-vs2019-cpu-py3 / build (gh) (matched win rule in flaky-rules.json)
##[error]The operation was canceled.
trunk / win-vs2019-cuda11.8-py3 / build (gh) (matched win rule in flaky-rules.json)
##[error]The operation was canceled.
xpu / linux-jammy-xpu-py3.8 / build (gh) (similar failure)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 47b77bf426021ce6aa3c64c13c7d24ca38b13703 Pull Request resolved: #127865

[ghstack-poisoned]

ghstack-source-id: d2f0b7ca5b8f2d386a12b605e5ad1d19ba24041a Pull Request resolved: #127865

[ghstack-poisoned]

ghstack-source-id: d9b26a516a9ba776b7639a21a6df10cfc71f74a9 Pull Request resolved: #127865

torchgen/dest/register_dispatch_key.py

torchgen/gen.py

[ghstack-poisoned]

ghstack-source-id: 8c71248152e86debe227fcf2f964b90c3ff35bae Pull Request resolved: #127865

EikanWang

Please add UT.

torchgen/gen.py

torchgen/dest/register_dispatch_key.py

torchgen/gen.py

[ghstack-poisoned]

ghstack-source-id: 4fe1274ff777b79e4a9e0c164eb2bb1e3b19b89f Pull Request resolved: #127865

EikanWang · 2024-06-12T07:56:58Z

tools/test/test_gen_backendwhitelist.py

I suppose the test cases should be part of test_codegen.py but not a dedicated test file.

Yes, I have moved the ut into test_codegen.py

[ghstack-poisoned]

ghstack-source-id: 680b8b6b496a854d61f24327f7c5e35d86555433 Pull Request resolved: #127865

[ghstack-poisoned]

ghstack-source-id: 36beb6c4158282d776ab3246a969028618810405 Pull Request resolved: #127865

EikanWang · 2024-06-16T09:22:23Z

@ZhiweiYan-96 , please fix UT failures and we need to enable it ASAP.

EikanWang · 2024-06-16T09:24:43Z

aten/src/ATen/xpu/xpu_functions.yaml

@@ -0,0 +1,55 @@
+# This yaml is used only for testing the XPU codegen functionality. 


Why do we need to add this yaml to the source code? I think it should be the data file of test folder. Why is it not be a part of test/xpu?

Indeed, have moved into test/xpu, thanks for the comment

[ghstack-poisoned]

ghstack-source-id: 9b4b4ec286689733ab3da6d6b72e5f24b8203897 Pull Request resolved: #127865

[ghstack-poisoned]

ghstack-source-id: 8f93fe94e2b0964208b29a7734f91756b48c0823 Pull Request resolved: #127865

[ghstack-poisoned]

ghstack-source-id: f3d8f75998366b2efc771df01483c97718377a8a Pull Request resolved: pytorch/pytorch#127865

Update

b5fd96d

[ghstack-poisoned]

ZhiweiYan-96 mentioned this pull request Jun 4, 2024

[Intel GPU] Dispatch Stub support #127860

Closed

pytorch-bot bot added the topic: not user facing topic category label Jun 4, 2024

ZhiweiYan-96 added a commit that referenced this pull request Jun 4, 2024

[Intel GPU] xpu-ops codegen via backend whitelist

ecc3774

ghstack-source-id: 47b77bf426021ce6aa3c64c13c7d24ca38b13703 Pull Request resolved: #127865

ZhiweiYan-96 marked this pull request as draft June 4, 2024 07:02

pytorchbot added the open source label Jun 4, 2024

Update

d97f238

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request Jun 6, 2024

[Intel GPU] xpu-ops codegen via backend whitelist

500c711

ghstack-source-id: d2f0b7ca5b8f2d386a12b605e5ad1d19ba24041a Pull Request resolved: #127865

Update

d4e88da

[ghstack-poisoned]

Update

5554916

[ghstack-poisoned]

Update

1cb9e14

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request Jun 6, 2024

[Intel GPU] xpu-ops codegen via backend whitelist

289b332

ghstack-source-id: d9b26a516a9ba776b7639a21a6df10cfc71f74a9 Pull Request resolved: #127865

gujinghui reviewed Jun 7, 2024

View reviewed changes

torchgen/dest/register_dispatch_key.py Outdated Show resolved Hide resolved

gujinghui reviewed Jun 7, 2024

View reviewed changes

torchgen/gen.py Outdated Show resolved Hide resolved

Update

56a0994

[ghstack-poisoned]

Update

22a0660

[ghstack-poisoned]

Update

5487aa7

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request Jun 11, 2024

[Intel GPU] xpu-ops codegen via backend whitelist

d9cff66

ghstack-source-id: 8c71248152e86debe227fcf2f964b90c3ff35bae Pull Request resolved: #127865

EikanWang requested changes Jun 11, 2024

View reviewed changes

Update

e635daa

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request Jun 12, 2024

[Intel GPU] xpu-ops codegen via backend whitelist

e8ca3fc

ghstack-source-id: 4fe1274ff777b79e4a9e0c164eb2bb1e3b19b89f Pull Request resolved: #127865

EikanWang requested changes Jun 12, 2024

View reviewed changes

Update

8e8bb33

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request Jun 13, 2024

[Intel GPU] xpu-ops codegen via backend whitelist

98729ca

ghstack-source-id: 680b8b6b496a854d61f24327f7c5e35d86555433 Pull Request resolved: #127865

ZhiweiYan-96 added the ciflow/xpu Run XPU CI tasks label Jun 14, 2024

ZhiweiYan-96 added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 14, 2024

Update

16b75c3

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request Jun 14, 2024

[Intel GPU] xpu-ops codegen via backend whitelist

1635cf3

ghstack-source-id: 36beb6c4158282d776ab3246a969028618810405 Pull Request resolved: #127865

EikanWang requested changes Jun 16, 2024

View reviewed changes

Update

97161cc

[ghstack-poisoned]

Update

60d2ba1

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request Jun 17, 2024

[Intel GPU] xpu-ops codegen via backend whitelist

d637539

ghstack-source-id: 9b4b4ec286689733ab3da6d6b72e5f24b8203897 Pull Request resolved: #127865

Update

d419737

[ghstack-poisoned]

ZhiweiYan-96 added a commit that referenced this pull request Jun 19, 2024

[Intel GPU] xpu-ops codegen via backend whitelist

f2e4a5c

ghstack-source-id: 8f93fe94e2b0964208b29a7734f91756b48c0823 Pull Request resolved: #127865

ZhiweiYan-96 mentioned this pull request Jun 25, 2024

[Intel GPU] Add XPU into device list of copy_impl #129452

Closed

Update

b31d00f

[ghstack-poisoned]

ZhiweiYan-96 closed this Jul 3, 2024

17336310621 pushed a commit to 17336310621/-PyTorch that referenced this pull request Aug 5, 2024

[Intel GPU] xpu-ops codegen via backend whitelist

e697dde

ghstack-source-id: f3d8f75998366b2efc771df01483c97718377a8a Pull Request resolved: pytorch/pytorch#127865

github-actions bot deleted the gh/ZhiweiYan-96/12/head branch August 10, 2024 01:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Intel GPU] xpu-ops codegen via backend whitelist #127865

[Intel GPU] xpu-ops codegen via backend whitelist #127865

ZhiweiYan-96 commented Jun 4, 2024 •

edited

Loading

pytorch-bot bot commented Jun 4, 2024 •

edited

Loading

EikanWang left a comment

EikanWang Jun 12, 2024

ZhiweiYan-96 Jun 18, 2024

EikanWang commented Jun 16, 2024

EikanWang Jun 16, 2024

ZhiweiYan-96 Jun 18, 2024

		@@ -0,0 +1,55 @@
		# This yaml is used only for testing the XPU codegen functionality.

[Intel GPU] xpu-ops codegen via backend whitelist #127865

[Intel GPU] xpu-ops codegen via backend whitelist #127865

Conversation

ZhiweiYan-96 commented Jun 4, 2024 • edited Loading

Motivation

Usage

Result

Verification

Others

pytorch-bot bot commented Jun 4, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/127865

❌ 39 Cancelled Jobs, 13 Unrelated Failures

EikanWang left a comment

Choose a reason for hiding this comment

EikanWang Jun 12, 2024

Choose a reason for hiding this comment

ZhiweiYan-96 Jun 18, 2024

Choose a reason for hiding this comment

EikanWang commented Jun 16, 2024

EikanWang Jun 16, 2024

Choose a reason for hiding this comment

ZhiweiYan-96 Jun 18, 2024

Choose a reason for hiding this comment

ZhiweiYan-96 commented Jun 4, 2024 •

edited

Loading

pytorch-bot bot commented Jun 4, 2024 •

edited

Loading