[FSDP2] Used multi-grad hook when no inputs require grad #129259

awgu · 2024-06-21T19:13:26Z

Stack from ghstack (oldest at bottom):

This PR uses register_multi_post_accumuluate_grad_hook to run the post-backward logic when the module inputs do not require gradient.

For context, FSDP2 currently relies on a hook on the module inputs that require gradient to run the post-backward logic (like a module full backward hook except with pytree support). This means that if none of the module inputs require gradient, then the post-backward logic is deferred to the final callback that runs at the end of backward, which may not be timely in some hybrid parallelism cases (e.g. a sparse arch before the FSDP dense arch). To address this case, we use this new multi-post-accumulate-grad hook.

Since whether the module inputs requiring grad could be dynamic from iteration to iteration, we guard this on a flag to make sure that we use the existing logic if the module inputs do require gradient.

Common Case

In the common case, the only module whose forward inputs do not require gradient is the root FSDP module, where the root FSDP module is the overall model's root. In this case, the multi-grad hook simply moves the root's post-backward from the final callback to a preceding AccumulateGrad node.

New:

Old:

cc @XilunWu @H-Huang @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @penguinwu @tianyu-l @yf225 @chauhang

Differential Revision: D59012616

[ghstack-poisoned]

pytorch-bot · 2024-06-21T19:13:29Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129259

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit a05c807 with merge base 2820e1d ():

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

pull / linux-focal-cuda11.8-py3.10-gcc9 / test (distributed, 2, 3, linux.8xlarge.nvidia.gpu) (gh) (related job)
distributed/_composable/fsdp/test_fully_shard_compile.py::TestFullyShardCompile::test_nested_fully_shard_fullgraph_backend_aot_eager
trunk / linux-focal-cuda11.8-py3.10-gcc9-experimental-split-build-test / test (distributed, 2, 3, linux.8xlarge.nvidia.gpu) (gh) (#129539)
distributed/_composable/fsdp/test_fully_shard_compile.py::TestFullyShardCompile::test_nested_fully_shard_fullgraph_backend_aot_eager

This comment was automatically generated by Dr. CI and updates every 15 minutes.

cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k [ghstack-poisoned]

<details> <summary> Common Case </summary> In the common case, the only module whose forward inputs do not require gradient is the root FSDP module, where the root FSDP module is the overall model's root. In this case, the multi-grad hook simply moves the root's post-backward from the final callback to a preceding `AccumulateGrad` node. New: ![Screenshot 2024-06-21 at 3 17 06 PM](https://github.com/pytorch/pytorch/assets/31054793/e06cc6e4-2bba-488b-b15d-1a55c881e40f) Old: ![Screenshot 2024-06-21 at 3 16 45 PM](https://github.com/pytorch/pytorch/assets/31054793/22eb1bcc-f128-459f-961c-f4f6ded00aab) </details> cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k [ghstack-poisoned]

torch/distributed/_composable/fsdp/_fsdp_param_group.py

<details> <summary> Common Case </summary> In the common case, the only module whose forward inputs do not require gradient is the root FSDP module, where the root FSDP module is the overall model's root. In this case, the multi-grad hook simply moves the root's post-backward from the final callback to a preceding `AccumulateGrad` node. New: ![Screenshot 2024-06-21 at 3 17 06 PM](https://github.com/pytorch/pytorch/assets/31054793/e06cc6e4-2bba-488b-b15d-1a55c881e40f) Old: ![Screenshot 2024-06-21 at 3 16 45 PM](https://github.com/pytorch/pytorch/assets/31054793/22eb1bcc-f128-459f-961c-f4f6ded00aab) </details> cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k [ghstack-poisoned]

awgu · 2024-06-21T21:41:35Z

torch/distributed/_composable/fsdp/_fsdp_param_group.py

@@ -556,3 +584,35 @@ def forward(ctx, param_group: FSDPParamGroup, *inputs: torch.Tensor):
 def backward(ctx, *grads: torch.Tensor):
 ctx.param_group.post_backward()
 return (None,) + grads
+


For context, the existing multi-grad hook does not work so well with the existing post_backward because the last parameter's .grad is not assigned yet when the mulit-grad hook runs (expected behavior).

We actually prefer this mulit-post-accumulate-grad hook, where the last parameter's .grad has already been assigned.

awgu · 2024-06-21T21:53:47Z

torch/distributed/_composable/fsdp/_fsdp_param_group.py

+ ]
+ self._multi_grad_hook_handle = _register_multi_post_acc_grad_hook(
+ tensors, self._multi_grad_post_backward
+ )
 return args, kwargs # no tensors that require gradients


The key idea is that before, we just return when there are no module inputs that require grad, falling back to the final callback that runs at the end of backward to run post-backward for this module.

This multi-post-acc-grad hook allows running the module's post-backward earlier (but still correctly).

This is neat!

<details> <summary> Common Case </summary> In the common case, the only module whose forward inputs do not require gradient is the root FSDP module, where the root FSDP module is the overall model's root. In this case, the multi-grad hook simply moves the root's post-backward from the final callback to a preceding `AccumulateGrad` node. New: ![Screenshot 2024-06-21 at 3 17 06 PM](https://github.com/pytorch/pytorch/assets/31054793/e06cc6e4-2bba-488b-b15d-1a55c881e40f) Old: ![Screenshot 2024-06-21 at 3 16 45 PM](https://github.com/pytorch/pytorch/assets/31054793/22eb1bcc-f128-459f-961c-f4f6ded00aab) </details> cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k [ghstack-poisoned]

ghstack-source-id: 0efb902e087dc525b4474da374a6f3bf6cc9c23e Pull Request resolved: #129259

<details> <summary> Common Case </summary> In the common case, the only module whose forward inputs do not require gradient is the root FSDP module, where the root FSDP module is the overall model's root. In this case, the multi-grad hook simply moves the root's post-backward from the final callback to a preceding `AccumulateGrad` node. New: ![Screenshot 2024-06-21 at 3 17 06 PM](https://github.com/pytorch/pytorch/assets/31054793/e06cc6e4-2bba-488b-b15d-1a55c881e40f) Old: ![Screenshot 2024-06-21 at 3 16 45 PM](https://github.com/pytorch/pytorch/assets/31054793/22eb1bcc-f128-459f-961c-f4f6ded00aab) </details> cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k [ghstack-poisoned]

ghstack-source-id: c91c7b9863dbbde9839ec526007f5d73495d0f67 Pull Request resolved: #129259

awgu · 2024-06-25T17:06:34Z

@awgu has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

<details> <summary> Common Case </summary> In the common case, the only module whose forward inputs do not require gradient is the root FSDP module, where the root FSDP module is the overall model's root. In this case, the multi-grad hook simply moves the root's post-backward from the final callback to a preceding `AccumulateGrad` node. New: ![Screenshot 2024-06-21 at 3 17 06 PM](https://github.com/pytorch/pytorch/assets/31054793/e06cc6e4-2bba-488b-b15d-1a55c881e40f) Old: ![Screenshot 2024-06-21 at 3 16 45 PM](https://github.com/pytorch/pytorch/assets/31054793/22eb1bcc-f128-459f-961c-f4f6ded00aab) </details> cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k Differential Revision: [D59012616](https://our.internmc.facebook.com/intern/diff/D59012616) [ghstack-poisoned]

ghstack-source-id: 1d6c8c588f3d5d15b6fe9d018ec976ca249d800f Pull Request resolved: #129259

<details> <summary> Common Case </summary> In the common case, the only module whose forward inputs do not require gradient is the root FSDP module, where the root FSDP module is the overall model's root. In this case, the multi-grad hook simply moves the root's post-backward from the final callback to a preceding `AccumulateGrad` node. New: ![Screenshot 2024-06-21 at 3 17 06 PM](https://github.com/pytorch/pytorch/assets/31054793/e06cc6e4-2bba-488b-b15d-1a55c881e40f) Old: ![Screenshot 2024-06-21 at 3 16 45 PM](https://github.com/pytorch/pytorch/assets/31054793/22eb1bcc-f128-459f-961c-f4f6ded00aab) </details> cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k Differential Revision: [D59012616](https://our.internmc.facebook.com/intern/diff/D59012616) [ghstack-poisoned]

ghstack-source-id: 1f70eb05e82c2fc4afaff50f68ab9248b813e3ac Pull Request resolved: #129259

<details> <summary> Common Case </summary> In the common case, the only module whose forward inputs do not require gradient is the root FSDP module, where the root FSDP module is the overall model's root. In this case, the multi-grad hook simply moves the root's post-backward from the final callback to a preceding `AccumulateGrad` node. New: ![Screenshot 2024-06-21 at 3 17 06 PM](https://github.com/pytorch/pytorch/assets/31054793/e06cc6e4-2bba-488b-b15d-1a55c881e40f) Old: ![Screenshot 2024-06-21 at 3 16 45 PM](https://github.com/pytorch/pytorch/assets/31054793/22eb1bcc-f128-459f-961c-f4f6ded00aab) </details> cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k Differential Revision: [D59012616](https://our.internmc.facebook.com/intern/diff/D59012616) [ghstack-poisoned]

ghstack-source-id: 1ca85b983638ea6fe3d8b0c2a15926e5f44b11be Pull Request resolved: #129259

<details> <summary> Common Case </summary> In the common case, the only module whose forward inputs do not require gradient is the root FSDP module, where the root FSDP module is the overall model's root. In this case, the multi-grad hook simply moves the root's post-backward from the final callback to a preceding `AccumulateGrad` node. New: ![Screenshot 2024-06-21 at 3 17 06 PM](https://github.com/pytorch/pytorch/assets/31054793/e06cc6e4-2bba-488b-b15d-1a55c881e40f) Old: ![Screenshot 2024-06-21 at 3 16 45 PM](https://github.com/pytorch/pytorch/assets/31054793/22eb1bcc-f128-459f-961c-f4f6ded00aab) </details> cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k Differential Revision: [D59012616](https://our.internmc.facebook.com/intern/diff/D59012616) [ghstack-poisoned]

ghstack-source-id: c19e7ec40459791d907d234fa6aa6fdb3a9182de Pull Request resolved: #129259

<details> <summary> Common Case </summary> In the common case, the only module whose forward inputs do not require gradient is the root FSDP module, where the root FSDP module is the overall model's root. In this case, the multi-grad hook simply moves the root's post-backward from the final callback to a preceding `AccumulateGrad` node. New: ![Screenshot 2024-06-21 at 3 17 06 PM](https://github.com/pytorch/pytorch/assets/31054793/e06cc6e4-2bba-488b-b15d-1a55c881e40f) Old: ![Screenshot 2024-06-21 at 3 16 45 PM](https://github.com/pytorch/pytorch/assets/31054793/22eb1bcc-f128-459f-961c-f4f6ded00aab) </details> cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k Differential Revision: [D59012616](https://our.internmc.facebook.com/intern/diff/D59012616) [ghstack-poisoned]

ghstack-source-id: 016f50f0b8aeac260b5ff808bb223d890e36409e Pull Request resolved: #129259

This PR uses `register_multi_post_accumuluate_grad_hook` to run the post-backward logic when the module inputs do not require gradient. For context, FSDP2 currently relies on a hook on the module inputs that require gradient to run the post-backward logic (like a module full backward hook except with pytree support). This means that if none of the module inputs require gradient, then the post-backward logic is deferred to the final callback that runs at the end of backward, which may not be timely in some hybrid parallelism cases (e.g. a sparse arch before the FSDP dense arch). To address this case, we use this new multi-post-accumulate-grad hook. Since whether the module inputs requiring grad could be dynamic from iteration to iteration, we guard this on a flag to make sure that we use the existing logic if the module inputs do require gradient. <details> <summary> Common Case </summary> In the common case, the only module whose forward inputs do not require gradient is the root FSDP module, where the root FSDP module is the overall model's root. In this case, the multi-grad hook simply moves the root's post-backward from the final callback to a preceding `AccumulateGrad` node. New: ![Screenshot 2024-06-21 at 3 17 06 PM](https://github.com/pytorch/pytorch/assets/31054793/e06cc6e4-2bba-488b-b15d-1a55c881e40f) Old: ![Screenshot 2024-06-21 at 3 16 45 PM](https://github.com/pytorch/pytorch/assets/31054793/22eb1bcc-f128-459f-961c-f4f6ded00aab) </details> cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k Differential Revision: [D59012616](https://our.internmc.facebook.com/intern/diff/D59012616) [ghstack-poisoned]

ghstack-source-id: 44001c46cc57072f946599e17c4d6af1a642fce8 Pull Request resolved: #129259

This PR uses `register_multi_post_accumuluate_grad_hook` to run the post-backward logic when the module inputs do not require gradient. For context, FSDP2 currently relies on a hook on the module inputs that require gradient to run the post-backward logic (like a module full backward hook except with pytree support). This means that if none of the module inputs require gradient, then the post-backward logic is deferred to the final callback that runs at the end of backward, which may not be timely in some hybrid parallelism cases (e.g. a sparse arch before the FSDP dense arch). To address this case, we use this new multi-post-accumulate-grad hook. Since whether the module inputs requiring grad could be dynamic from iteration to iteration, we guard this on a flag to make sure that we use the existing logic if the module inputs do require gradient. <details> <summary> Common Case </summary> In the common case, the only module whose forward inputs do not require gradient is the root FSDP module, where the root FSDP module is the overall model's root. In this case, the multi-grad hook simply moves the root's post-backward from the final callback to a preceding `AccumulateGrad` node. New: ![Screenshot 2024-06-21 at 3 17 06 PM](https://github.com/pytorch/pytorch/assets/31054793/e06cc6e4-2bba-488b-b15d-1a55c881e40f) Old: ![Screenshot 2024-06-21 at 3 16 45 PM](https://github.com/pytorch/pytorch/assets/31054793/22eb1bcc-f128-459f-961c-f4f6ded00aab) </details> cc XilunWu H-Huang kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse penguinwu tianyu-l yf225 chauhang Differential Revision: [D59012616](https://our.internmc.facebook.com/intern/diff/D59012616) [ghstack-poisoned]

ghstack-source-id: 376c8b9628baef2dde1c194d7f7183514f6ea35f Pull Request resolved: #129259

This PR uses `register_multi_post_accumuluate_grad_hook` to run the post-backward logic when the module inputs do not require gradient. For context, FSDP2 currently relies on a hook on the module inputs that require gradient to run the post-backward logic (like a module full backward hook except with pytree support). This means that if none of the module inputs require gradient, then the post-backward logic is deferred to the final callback that runs at the end of backward, which may not be timely in some hybrid parallelism cases (e.g. a sparse arch before the FSDP dense arch). To address this case, we use this new multi-post-accumulate-grad hook. Since whether the module inputs requiring grad could be dynamic from iteration to iteration, we guard this on a flag to make sure that we use the existing logic if the module inputs do require gradient. <details> <summary> Common Case </summary> In the common case, the only module whose forward inputs do not require gradient is the root FSDP module, where the root FSDP module is the overall model's root. In this case, the multi-grad hook simply moves the root's post-backward from the final callback to a preceding `AccumulateGrad` node. New: ![Screenshot 2024-06-21 at 3 17 06 PM](https://github.com/pytorch/pytorch/assets/31054793/e06cc6e4-2bba-488b-b15d-1a55c881e40f) Old: ![Screenshot 2024-06-21 at 3 16 45 PM](https://github.com/pytorch/pytorch/assets/31054793/22eb1bcc-f128-459f-961c-f4f6ded00aab) </details> cc XilunWu H-Huang kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse penguinwu tianyu-l yf225 chauhang Differential Revision: [D59012616](https://our.internmc.facebook.com/intern/diff/D59012616) [ghstack-poisoned]

ghstack-source-id: 5fb80831bbfb9624fbe184af4b741bcedd102180 Pull Request resolved: #129259

sanketpurandare

This is very clean! Kudos!

[FSDP2] Used multi-grad hook when no inputs require grad

57fc3bc

[ghstack-poisoned]

awgu mentioned this pull request Jun 21, 2024

[FSDP2] Fixed unshard without lazy init #129241

Closed

pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: distributed (fsdp) release notes category labels Jun 21, 2024

Update on "[FSDP2] Used multi-grad hook when no inputs require grad"

7b430d7

cc mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse H-Huang kwen2501 penguinwu fegin XilunWu wanchaol fduwjj wz337 tianyu-l wconstab yf225 chauhang d4l3k [ghstack-poisoned]

awgu added release notes: distributed (fsdp2) release notes category and removed release notes: distributed (fsdp) release notes category labels Jun 21, 2024

awgu commented Jun 21, 2024

View reviewed changes

torch/distributed/_composable/fsdp/_fsdp_param_group.py Outdated Show resolved Hide resolved

awgu commented Jun 21, 2024

View reviewed changes

awgu added a commit that referenced this pull request Jun 21, 2024

[FSDP2] Used multi-grad hook when no inputs require grad

8696838

ghstack-source-id: 0efb902e087dc525b4474da374a6f3bf6cc9c23e Pull Request resolved: #129259

awgu added a commit that referenced this pull request Jun 25, 2024

[FSDP2] Used multi-grad hook when no inputs require grad

1f6a3c0

ghstack-source-id: c91c7b9863dbbde9839ec526007f5d73495d0f67 Pull Request resolved: #129259

sanketpurandare self-requested a review June 26, 2024 00:21

awgu mentioned this pull request Jul 26, 2024

Added register_multi_post_accumulate_grad_hook #131949

Open

awgu added a commit that referenced this pull request Jul 26, 2024

[FSDP2] Used multi-grad hook when no inputs require grad

fb92136

ghstack-source-id: 1d6c8c588f3d5d15b6fe9d018ec976ca249d800f Pull Request resolved: #129259

awgu added a commit that referenced this pull request Jul 29, 2024

[FSDP2] Used multi-grad hook when no inputs require grad

f1fe745

ghstack-source-id: 1f70eb05e82c2fc4afaff50f68ab9248b813e3ac Pull Request resolved: #129259

awgu added a commit that referenced this pull request Jul 29, 2024

[FSDP2] Used multi-grad hook when no inputs require grad

736b23b

ghstack-source-id: 1ca85b983638ea6fe3d8b0c2a15926e5f44b11be Pull Request resolved: #129259

awgu added a commit that referenced this pull request Jul 29, 2024

[FSDP2] Used multi-grad hook when no inputs require grad

8b3f4b2

ghstack-source-id: c19e7ec40459791d907d234fa6aa6fdb3a9182de Pull Request resolved: #129259

awgu added a commit that referenced this pull request Jul 29, 2024

[FSDP2] Used multi-grad hook when no inputs require grad

1897cbb

ghstack-source-id: 016f50f0b8aeac260b5ff808bb223d890e36409e Pull Request resolved: #129259

awgu marked this pull request as ready for review July 29, 2024 16:45

awgu requested review from wanchaol, weifengpy and ckluk2 July 29, 2024 16:45

awgu added a commit that referenced this pull request Jul 29, 2024

[FSDP2] Used multi-grad hook when no inputs require grad

7ea16c7

ghstack-source-id: 44001c46cc57072f946599e17c4d6af1a642fce8 Pull Request resolved: #129259

weifengpy approved these changes Jul 29, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 29, 2024

awgu added a commit that referenced this pull request Jul 29, 2024

[FSDP2] Used multi-grad hook when no inputs require grad

36326a7

ghstack-source-id: 376c8b9628baef2dde1c194d7f7183514f6ea35f Pull Request resolved: #129259

awgu added a commit that referenced this pull request Jul 29, 2024

[FSDP2] Used multi-grad hook when no inputs require grad

e1c84f8

ghstack-source-id: 5fb80831bbfb9624fbe184af4b741bcedd102180 Pull Request resolved: #129259

awgu mentioned this pull request Jul 30, 2024

[DTensor] Added naive support for nn.init.orthogonal_ #132104

Closed

sanketpurandare approved these changes Jul 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FSDP2] Used multi-grad hook when no inputs require grad #129259

[FSDP2] Used multi-grad hook when no inputs require grad #129259

awgu commented Jun 21, 2024 •

edited

Loading

pytorch-bot bot commented Jun 21, 2024 •

edited

Loading

awgu Jun 21, 2024

awgu Jun 21, 2024

sanketpurandare Jul 30, 2024

awgu commented Jun 25, 2024

sanketpurandare left a comment

[FSDP2] Used multi-grad hook when no inputs require grad #129259

Are you sure you want to change the base?

[FSDP2] Used multi-grad hook when no inputs require grad #129259

Conversation

awgu commented Jun 21, 2024 • edited Loading

pytorch-bot bot commented Jun 21, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129259

✅ You can merge normally! (2 Unrelated Failures)

awgu Jun 21, 2024

Choose a reason for hiding this comment

awgu Jun 21, 2024

Choose a reason for hiding this comment

sanketpurandare Jul 30, 2024

Choose a reason for hiding this comment

awgu commented Jun 25, 2024

sanketpurandare left a comment

Choose a reason for hiding this comment

awgu commented Jun 21, 2024 •

edited

Loading

pytorch-bot bot commented Jun 21, 2024 •

edited

Loading