Add warpSize to Device properties #128449

ramcherukuri · 2024-06-11T21:00:16Z

Adding warp_size to CudaDeviceProperties.

import torch
prop = torch.cuda.get_device_properties(torch.cuda.current_device())
prop.warp_size
64

@jeffdaily @pruthvistony @jithunnair-amd @ROCmSupport

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

pytorch-bot · 2024-06-11T21:00:19Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128449

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (4 Unrelated Failures)

As of commit 1e68689 with merge base c12a4f2 ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

periodic / win-vs2019-cuda11.8-py3 / test (default, 2, 4, windows.g5.4xlarge.nvidia.gpu) (gh) (similar failure)
test_decomp 16/26 failed!
periodic / win-vs2019-cuda11.8-py3 / test (default, 4, 4, windows.g5.4xlarge.nvidia.gpu) (gh) (similar failure)
test_decomp 9/26 failed!
rocm / linux-focal-rocm6.1-py3.8 / test (default, 4, 6, linux.rocm.gpu.2) (gh) (detected as infra flaky with no log or failing log classifier)
trunk / linux-focal-rocm6.1-py3.8 / test (distributed, 1, 1, linux.rocm.gpu) (gh) (detected as infra flaky with no log or failing log classifier)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jeffdaily · 2024-06-15T04:39:19Z

@pytorchbot rebase

pytorchmergebot · 2024-06-15T04:40:53Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-06-15T04:40:56Z

Successfully rebased warp_size-dev-prop onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout warp_size-dev-prop && git pull --rebase)

ramcherukuri · 2024-06-25T16:19:55Z

@malfet @huydhn, Can you please help review/merge this PR. Thank you

jithunnair-amd · 2024-06-27T22:28:10Z

@malfet Can you please approve/merge this PR? It's blocking another PR #129663

malfet · 2024-06-27T23:16:47Z

@pytorchbot merge

pytorchmergebot · 2024-06-27T23:19:24Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

jeffdaily · 2024-06-28T17:11:08Z

@pytorchbot merge

pytorchmergebot · 2024-06-28T17:12:39Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-06-28T23:11:35Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

jataylo · 2024-07-01T09:11:15Z

@pytorchbot merge

pytorchmergebot · 2024-07-01T09:13:05Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

@jeffdaily

Adding warp_size to CudaDeviceProperties. >>> import torch >>> prop = torch.cuda.get_device_properties(torch.cuda.current_device()) >>> prop.warp_size 64 >>> @jeffdaily @pruthvistony @jithunnair-amd @ROCmSupport Co-authored-by: Jithun Nair <[email protected]> Pull Request resolved: pytorch#128449 Approved by: https://github.com/eqy, https://github.com/jataylo, https://github.com/jithunnair-amd, https://github.com/malfet

…9663) As of ROCm 6.1 [hipDeviceProp_t::regsPerMultiprocessor](https://rocm.docs.amd.com/projects/HIP/en/latest/doxygen/html/structhip_device_prop__t.html#a7390d5b180d63978c81aa971060270b4) is now available allowing us to enable this attribute on ROCm. ``` >>> torch.cuda.get_device_properties(0) _CudaDeviceProperties(name='AMD Instinct MI250X/MI250', major=9, minor=0, gcnArchName='gfx90a:sramecc+:xnack-', total_memory=65520MB, multi_processor_count=104) >>> torch.cuda.get_device_properties(0).regs_per_multiprocessor 65536 ``` With https://github.com/triton-lang/triton/pull/3962we can extract n_regs and n_spells from a triton binary with AMD backend allowing us to enable inductor's dynamic_rblock_scaling on ROCm initially implemented in #115094 Leaving this in draft until following PRs have landed: - #129361 to bump the triton commit pin - #128449 to allow us to grab warp_size from device properties instead of hard coding 64 on ROCm. Pull Request resolved: #129663 Approved by: https://github.com/jansel, https://github.com/shunting314

…orch#129663) As of ROCm 6.1 [hipDeviceProp_t::regsPerMultiprocessor](https://rocm.docs.amd.com/projects/HIP/en/latest/doxygen/html/structhip_device_prop__t.html#a7390d5b180d63978c81aa971060270b4) is now available allowing us to enable this attribute on ROCm. ``` >>> torch.cuda.get_device_properties(0) _CudaDeviceProperties(name='AMD Instinct MI250X/MI250', major=9, minor=0, gcnArchName='gfx90a:sramecc+:xnack-', total_memory=65520MB, multi_processor_count=104) >>> torch.cuda.get_device_properties(0).regs_per_multiprocessor 65536 ``` With https://github.com/triton-lang/triton/pull/3962we can extract n_regs and n_spells from a triton binary with AMD backend allowing us to enable inductor's dynamic_rblock_scaling on ROCm initially implemented in pytorch#115094 Leaving this in draft until following PRs have landed: - pytorch#129361 to bump the triton commit pin - pytorch#128449 to allow us to grab warp_size from device properties instead of hard coding 64 on ROCm. Pull Request resolved: pytorch#129663 Approved by: https://github.com/jansel, https://github.com/shunting314

ramcherukuri requested a review from eqy as a code owner June 11, 2024 21:00

pytorch-bot bot added the module: inductor label Jun 11, 2024

pytorchbot added the open source label Jun 11, 2024

eqy approved these changes Jun 11, 2024

View reviewed changes

pruthvistony added ciflow/trunk Trigger trunk jobs on your pull request rocm This tag is for PRs from ROCm team ciflow/rocm ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR ciflow/inductor labels Jun 13, 2024

pruthvistony requested review from jeffdaily, pruthvistony and jithunnair-amd June 13, 2024 20:28

jataylo self-requested a review June 14, 2024 14:39

jataylo approved these changes Jun 14, 2024

View reviewed changes

ramcherukuri added 2 commits June 15, 2024 04:40

Add warpSize to Device properties

1e6d4b6

fix lint error

5b77544

pytorchmergebot force-pushed the warp_size-dev-prop branch from 36071ef to 5b77544 Compare June 15, 2024 04:40

lint error remove spaces

edbae26

jeffdaily requested review from huydhn and malfet June 19, 2024 16:28

jataylo mentioned this pull request Jun 27, 2024

[ROCm] Enable ROCm support for inductor's dynamic_rblock_scaling #129663

Draft

jithunnair-amd approved these changes Jun 27, 2024

View reviewed changes

malfet approved these changes Jun 27, 2024

View reviewed changes

pytorchmergebot added the merging label Jun 27, 2024

pytorchmergebot removed the merging label Jun 27, 2024

jataylo added the release notes: rocm mandatorylabel label Jun 28, 2024

ramcherukuri and others added 2 commits June 28, 2024 09:44

Update __init__.pyi.in resolving conflict.

00ecf2f

Merge branch 'main' into warp_size-dev-prop

1e68689

pytorchmergebot added the merging label Jun 28, 2024

malfet added the topic: improvements topic category label Jun 28, 2024

pytorchmergebot closed this in f6a0be5 Jul 1, 2024

pytorchmergebot added Merged and removed merging labels Jul 1, 2024

jithunnair-amd mentioned this pull request Jul 1, 2024

Find ROCm on Fedora microsoft/DeepSpeed#5705

Merged

pruthvistony mentioned this pull request Jul 29, 2024

Fix hardcoded rocm warp size #125433

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add warpSize to Device properties #128449

Add warpSize to Device properties #128449

ramcherukuri commented Jun 11, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jun 11, 2024 •

edited

Loading

jeffdaily commented Jun 15, 2024

pytorchmergebot commented Jun 15, 2024

pytorchmergebot commented Jun 15, 2024

ramcherukuri commented Jun 25, 2024 •

edited

Loading

jithunnair-amd commented Jun 27, 2024

malfet commented Jun 27, 2024

pytorchmergebot commented Jun 27, 2024

jeffdaily commented Jun 28, 2024

pytorchmergebot commented Jun 28, 2024

pytorchmergebot commented Jun 28, 2024

jataylo commented Jul 1, 2024

pytorchmergebot commented Jul 1, 2024

Add warpSize to Device properties #128449

Add warpSize to Device properties #128449

Conversation

ramcherukuri commented Jun 11, 2024 • edited by pytorch-bot bot Loading

pytorch-bot bot commented Jun 11, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128449

✅ You can merge normally! (4 Unrelated Failures)

jeffdaily commented Jun 15, 2024

pytorchmergebot commented Jun 15, 2024

pytorchmergebot commented Jun 15, 2024

ramcherukuri commented Jun 25, 2024 • edited Loading

jithunnair-amd commented Jun 27, 2024

malfet commented Jun 27, 2024

pytorchmergebot commented Jun 27, 2024

Merge failed

jeffdaily commented Jun 28, 2024

pytorchmergebot commented Jun 28, 2024

Merge started

pytorchmergebot commented Jun 28, 2024

jataylo commented Jul 1, 2024

pytorchmergebot commented Jul 1, 2024

Merge started

ramcherukuri commented Jun 11, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Jun 11, 2024 •

edited

Loading

ramcherukuri commented Jun 25, 2024 •

edited

Loading