Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FSDP] Removed clamp to NO_SHARD for world size 1 #120334

Closed
wants to merge 9 commits into from

Conversation

awgu
Copy link
Contributor

@awgu awgu commented Feb 21, 2024

Stack from ghstack (oldest at bottom):

This is a workaround for #119937. NO_SHARD always using unsharded views breaks with DTensor extensions since some logic assumes that sharded views are torch.Tensors, but NO_SHARD makes sharded views and unsharded views the same (as DTensors).

cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k @rohan-varma

Copy link

pytorch-bot bot commented Feb 21, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/120334

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 9 Unrelated Failures

As of commit 0c04620 with merge base c82fcb7 (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added release notes: distributed (fsdp) release notes category labels Feb 21, 2024
@github-actions github-actions bot added oncall: distributed Add this issue/PR to distributed oncall triage queue ciflow/inductor labels Feb 21, 2024
cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225

[ghstack-poisoned]
cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225

[ghstack-poisoned]
awgu added a commit that referenced this pull request Feb 21, 2024
ghstack-source-id: 1343e0992603c0c1c72484b8537f00caa93b6642
Pull Request resolved: #120334
@awgu awgu marked this pull request as ready for review February 22, 2024 17:19
@awgu awgu requested review from fegin and wanchaol February 22, 2024 17:20
@mvpatel2000
Copy link
Contributor

@awgu maybe it's worth emitting a warning still that NO_SHARD might be more performant in this case? Not strong opinion though

@awgu
Copy link
Contributor Author

awgu commented Feb 22, 2024

We have been seeing some issues with NO_SHARD internally, so we were planning to deprecate it. We do want to provide a good solution for a replicated data parallelism that can compose with FSDP (and we have started discussing this).

@awgu awgu added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 22, 2024
@mvpatel2000
Copy link
Contributor

#119937 (comment)

Flagging that I still see SHARD_GRAD_OP issue with this :( (but FULL_SHARD works fine!)

This is a workaround for #119937. `NO_SHARD` always using unsharded views breaks with `DTensor` extensions since some logic assumes that sharded views are `torch.Tensors`, but `NO_SHARD` makes sharded views and unsharded views the same (as `DTensor`s).

cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225

[ghstack-poisoned]
This is a workaround for #119937. `NO_SHARD` always using unsharded views breaks with `DTensor` extensions since some logic assumes that sharded views are `torch.Tensors`, but `NO_SHARD` makes sharded views and unsharded views the same (as `DTensor`s).

cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225

[ghstack-poisoned]
awgu added a commit that referenced this pull request Feb 29, 2024
ghstack-source-id: 9a4fe05414874ef723665f7828c553b7ac68faa1
Pull Request resolved: #120334
This is a workaround for #119937. `NO_SHARD` always using unsharded views breaks with `DTensor` extensions since some logic assumes that sharded views are `torch.Tensors`, but `NO_SHARD` makes sharded views and unsharded views the same (as `DTensor`s).

cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225

[ghstack-poisoned]
awgu added a commit that referenced this pull request Mar 8, 2024
ghstack-source-id: b5f45b6c5bfdc8e340a01aea9234285b4e416054
Pull Request resolved: #120334
@awgu awgu marked this pull request as draft March 8, 2024 20:39
This is a workaround for #119937. `NO_SHARD` always using unsharded views breaks with `DTensor` extensions since some logic assumes that sharded views are `torch.Tensors`, but `NO_SHARD` makes sharded views and unsharded views the same (as `DTensor`s).

cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225

[ghstack-poisoned]
This is a workaround for #119937. `NO_SHARD` always using unsharded views breaks with `DTensor` extensions since some logic assumes that sharded views are `torch.Tensors`, but `NO_SHARD` makes sharded views and unsharded views the same (as `DTensor`s).

cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225

[ghstack-poisoned]
awgu added a commit that referenced this pull request Mar 15, 2024
ghstack-source-id: dd2dd51c1e0334ecfc9ca7fd16d716acdaa9ee02
Pull Request resolved: #120334
This is a workaround for #119937. `NO_SHARD` always using unsharded views breaks with `DTensor` extensions since some logic assumes that sharded views are `torch.Tensors`, but `NO_SHARD` makes sharded views and unsharded views the same (as `DTensor`s).

cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225

[ghstack-poisoned]
awgu added a commit that referenced this pull request Apr 23, 2024
ghstack-source-id: d1451ca2dd53bc88d5b3da5ac71d994d2e023766
Pull Request resolved: #120334
Copy link

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

@github-actions github-actions bot added the Stale label Jun 23, 2024
@awgu awgu closed this Jul 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (fsdp) release notes category Stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants