-
Notifications
You must be signed in to change notification settings - Fork 21.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FSDP] Removed clamp to NO_SHARD
for world size 1
#120334
Conversation
[ghstack-poisoned]
cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 [ghstack-poisoned]
cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 [ghstack-poisoned]
ghstack-source-id: 1343e0992603c0c1c72484b8537f00caa93b6642 Pull Request resolved: #120334
@awgu maybe it's worth emitting a warning still that NO_SHARD might be more performant in this case? Not strong opinion though |
We have been seeing some issues with |
Flagging that I still see SHARD_GRAD_OP issue with this :( (but FULL_SHARD works fine!) |
This is a workaround for #119937. `NO_SHARD` always using unsharded views breaks with `DTensor` extensions since some logic assumes that sharded views are `torch.Tensors`, but `NO_SHARD` makes sharded views and unsharded views the same (as `DTensor`s). cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 [ghstack-poisoned]
This is a workaround for #119937. `NO_SHARD` always using unsharded views breaks with `DTensor` extensions since some logic assumes that sharded views are `torch.Tensors`, but `NO_SHARD` makes sharded views and unsharded views the same (as `DTensor`s). cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 [ghstack-poisoned]
ghstack-source-id: 9a4fe05414874ef723665f7828c553b7ac68faa1 Pull Request resolved: #120334
This is a workaround for #119937. `NO_SHARD` always using unsharded views breaks with `DTensor` extensions since some logic assumes that sharded views are `torch.Tensors`, but `NO_SHARD` makes sharded views and unsharded views the same (as `DTensor`s). cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 [ghstack-poisoned]
ghstack-source-id: b5f45b6c5bfdc8e340a01aea9234285b4e416054 Pull Request resolved: #120334
This is a workaround for #119937. `NO_SHARD` always using unsharded views breaks with `DTensor` extensions since some logic assumes that sharded views are `torch.Tensors`, but `NO_SHARD` makes sharded views and unsharded views the same (as `DTensor`s). cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 [ghstack-poisoned]
This is a workaround for #119937. `NO_SHARD` always using unsharded views breaks with `DTensor` extensions since some logic assumes that sharded views are `torch.Tensors`, but `NO_SHARD` makes sharded views and unsharded views the same (as `DTensor`s). cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 [ghstack-poisoned]
ghstack-source-id: dd2dd51c1e0334ecfc9ca7fd16d716acdaa9ee02 Pull Request resolved: #120334
This is a workaround for #119937. `NO_SHARD` always using unsharded views breaks with `DTensor` extensions since some logic assumes that sharded views are `torch.Tensors`, but `NO_SHARD` makes sharded views and unsharded views the same (as `DTensor`s). cc mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 [ghstack-poisoned]
ghstack-source-id: d1451ca2dd53bc88d5b3da5ac71d994d2e023766 Pull Request resolved: #120334
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as |
Stack from ghstack (oldest at bottom):
NO_SHARD
for world size 1 #120334ModuleList
/ModuleDict
#124764This is a workaround for #119937.
NO_SHARD
always using unsharded views breaks withDTensor
extensions since some logic assumes that sharded views aretorch.Tensors
, butNO_SHARD
makes sharded views and unsharded views the same (asDTensor
s).cc @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @H-Huang @kwen2501 @penguinwu @fegin @XilunWu @wanchaol @fduwjj @wz337 @tianyu-l @wconstab @yf225 @chauhang @d4l3k @rohan-varma