Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove ProcessGroupCudaP2P and change async-TP to use SymmetricMemory #128762

Closed
wants to merge 7 commits into from

Conversation

yifuwang
Copy link
Contributor

@yifuwang yifuwang commented Jun 15, 2024

Copy link

pytorch-bot bot commented Jun 15, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128762

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (4 Unrelated Failures)

As of commit 61ca931 with merge base dd00f5e (image):

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/inductor module: inductor oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (pipeline) release notes category labels Jun 15, 2024
…emory"

cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire

[ghstack-poisoned]
yifuwang added a commit that referenced this pull request Jun 15, 2024
ghstack-source-id: a3eb6a484c6db3c9e7e077d448e4415792de872e
Pull Request resolved: #128762
…emory"

cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire

[ghstack-poisoned]
yifuwang added a commit that referenced this pull request Jun 17, 2024
ghstack-source-id: c41f11c4bfdf337d7a9ad67b89767dac54ebfc61
Pull Request resolved: #128762
@yifuwang yifuwang changed the title Remove ProcessGroupCudaP2P and port async-TP to SymmetricMemory Remove ProcessGroupCudaP2P and change async-TP to use SymmetricMemory Jun 17, 2024
@yifuwang yifuwang marked this pull request as ready for review June 17, 2024 20:58
@yifuwang yifuwang requested a review from a team as a code owner June 17, 2024 20:58
Copy link
Contributor

@wanchaol wanchaol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds great! Let's go!

@Chillee
Copy link
Contributor

Chillee commented Jun 17, 2024

How do we control whether SymmetricMemory is used now?

@yifuwang
Copy link
Contributor Author

@Chillee it is enabled via enable_symm_mem_for_group(group_name), which makes _SymmetricMemory.empty_strided work with the said group. The initialization cost is very small and the workspace is only initialized if used. Once the API matures we could enable it for all eligible process groups by default.

…etricMemory"


This PR removes `ProcessGroupCudaP2P` and changes async-TP to use `SymmetricMemory`. The async-TP implementation is still workspace-based, but it now doesn't require a buffer size to be specified upfront.

cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire

[ghstack-poisoned]
yifuwang added a commit that referenced this pull request Jun 18, 2024
ghstack-source-id: 3ca8f70453393c523add323bccfe1d8554001036
Pull Request resolved: #128762
@jianc99
Copy link

jianc99 commented Jun 20, 2024

Hi, I am using gpt-fast to perform multi-GPU inference. When I set ENABLE_INTRA_NODE_COMM=1, I got the RuntimeError: : CUDASymmetricMemory requires pidfd_open and pidfd_getfd support. I am using ubuntu 20.04. Can I solve this problem without updating the kernels? Thanks.

…etricMemory"


This PR removes `ProcessGroupCudaP2P` and changes async-TP to use `SymmetricMemory`. The async-TP implementation is still workspace-based, but it now doesn't require a buffer size to be specified upfront.

cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire

[ghstack-poisoned]
@yifuwang
Copy link
Contributor Author

Can I solve this problem without updating the kernels?

@jianc99 I can try to address this. Can you open an issue for it?

…etricMemory"


This PR removes `ProcessGroupCudaP2P` and changes async-TP to use `SymmetricMemory`. The async-TP implementation is still workspace-based, but it now doesn't require a buffer size to be specified upfront.

cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire

[ghstack-poisoned]
@yifuwang yifuwang added topic: not user facing topic category and removed release notes: distributed (pipeline) release notes category labels Jun 25, 2024
@yifuwang
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 25, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64 / test (default, 2, 3, macos-m1-stable)

Details for Dev Infra team Raised by workflow job

…etricMemory"


This PR removes `ProcessGroupCudaP2P` and changes async-TP to use `SymmetricMemory`. The async-TP implementation is still workspace-based, but it now doesn't require a buffer size to be specified upfront.

cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 awgu penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k voznesenskym EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire

[ghstack-poisoned]
@yifuwang
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

yifuwang added a commit to yifuwang/pytorch that referenced this pull request Jul 24, 2024
ghstack-source-id: a3eb6a484c6db3c9e7e077d448e4415792de872e
Pull Request resolved: pytorch#128762
@github-actions github-actions bot deleted the gh/yifuwang/93/head branch July 26, 2024 01:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged module: inductor oncall: distributed Add this issue/PR to distributed oncall triage queue topic: not user facing topic category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants