[aota] Needs autograd if an input requires_grad, agnostic to enable_grad #128890

IvanKobzarev · 2024-06-17T21:00:32Z

Stack from ghstack (oldest at bottom):

-> [aota] Needs autograd if an input requires_grad, agnostic to enable_grad #128890

Original issue: #114338

Reland of: #128016

Summary from previous PR:
We assume only two possible mutually exclusive scenarios:

Running compiled region for training (Any of inputs has requires_grad)

Produced differentiable outputs should have requires_grad.
Running compiled region for inference (None of inputs has requires_grad)

All outputs do not have requires_grad.
Even if user runs the region under no_grad(), but has an input Tensor with requires_grad - we go Training scenario (1).

With current state that means:
1/ needs_autograd should not check torch.is_grad_enabled(), only that any of inputs requires_grad
2/ if needs_autograd => trace_joint (We are in training scenario 1.) => always run compiled region under with.enable_grad()

Changes in partitioner?

Inference and Training graphs had difference in return container, list/tuple.
The changes in partitioner are done to unify and return always tuple.
As a result - some changes in test_aotdispatch.py for graph contents list -> tuple.

Why was revert?

There was a regression of hf_Reformer model on inference.

TORCHINDUCTOR_FX_GRAPH_CACHE=0 python benchmarks/dynamo/torchbench.py --performance --inference --bfloat16 --backend inductor --device cuda --only hf_Reformer --cold-start-latency --use-eval-mode

Because one of the compiled graphs contained outputs, which are aliases to the inputs that are nn.Parameter(requires_grad=True).

Even if inference bencharmsk torchbench runs inside with torch.no_grad() - alias (specifically for hf_Reformer - expand) ops preserve requires_grad.

As a result we started compiling training graph instead of inference.

Fix for view ops:

If we have outputs, that are aliases to inputs that requires_grad, those outputs requires grad is not a reason to generate training graph.

This is handled in aot_autograd.py, where output_and_mutation_safe are calculated.

cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse

[ghstack-poisoned]

pytorch-bot · 2024-06-17T21:00:36Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128890

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 9c5de20 with merge base e6cddc9 ():

UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:

periodic / linux-focal-rocm6.1-py3.8 / test (distributed, 1, 3, linux.rocm.gpu, unstable) (gh) (#129209)
distributed/tensor/parallel/test_micro_pipeline_tp.py::MicroPipelineTPTest::test_dtensor_seq_par_shard_dim_0

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 35d40650db81040f8eca0476194ffc7f918be565 Pull Request resolved: #128890

ezyang · 2024-06-17T21:29:08Z

I had an instinctive dislike to the approach taken here but after thinking about it I couldn't figure any other way to do it lol. We should make sure we have hit all the paths that actually need enable_grad, and I guess we need to make sure there aren't funny no grad / view interactions (cc @albanD). This could be right, but a more detailed case analysis for the various situations would make me feel more confident about it

…to enable_grad" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

IvanKobzarev · 2024-06-20T15:06:32Z

About funny view interactions:

Just found that view type ops preserve requires_grad even in no_grad, that resulted in one of the models in torchbench to start generating training graph under no_grad as it had outputs, that were torch.expand of nn.Parameter(requires_grad=True) inptus :)

bdhirsh · 2024-06-20T15:10:12Z

benchmarks/dynamo/torchbench.py

@@ -346,6 +346,8 @@ def load_model(
 model.train()
 else:
 model.eval()
+ model.requires_grad_(False)


hmmm i'm not convinced we want to do this... since this will just paper over any bugs that we have. If torchbench gets bad inference performance unless we manually set all parameters to not require grad, it should show up as a regression so we can actually diagnose and fix it.

Oh, sorry, forgot to remove this. Will clean it.

…to enable_grad" Reland of: #128016 Summary from previous PR: We assume only two possible mutually exclusive scenarios: Running compiled region for training (Any of inputs has requires_grad) Produced differentiable outputs should have requires_grad. Running compiled region for inference (None of inputs has requires_grad) All outputs do not have requires_grad. Even if user runs the region under no_grad(), but has an input Tensor with requires_grad - we go Training scenario (1). With current state that means: 1/ needs_autograd should not check torch.is_grad_enabled(), only that any of inputs requires_grad 2/ if needs_autograd => trace_joint (We are in training scenario 1.) => always run compiled region under with.enable_grad() Why was revert? There was a regression of hf_Reformer model on inference. ``` TORCHINDUCTOR_FX_GRAPH_CACHE=0 python benchmarks/dynamo/torchbench.py --performance --inference --bfloat16 --backend inductor --device cuda --only hf_Reformer --cold-start-latency --use-eval-mode ``` Because one of the compiled graphs contained outputs, which are aliases to the inputs that are nn.Parameter(requires_grad=True). Even if inference bencharmsk torchbench runs inside with` torch.no_grad()` - alias (specifically for hf_Reformer - expand) ops preserve requires_grad. As a result we started compiling training graph instead of inference. Fix for view ops: If we have outputs, that are aliases to inputs that requires_grad, those outputs requires grad is not a reason to generate training graph. This is handled in aot_autograd.py, where output_and_mutation_safe are calculated. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

ghstack-source-id: a5a0e9110ea3697c672eeba9bf13959b6b52ba90 Pull Request resolved: #128890

bdhirsh · 2024-06-20T15:23:02Z

torch/_functorch/aot_autograd.py

- x.requires_grad for x in fw_metadata.output_info
+ x.requires_grad
+ # view operations preserve requires_grad even in no_grad.
+ # Do not count aliases of inputs with requires_grad as reason to make a training graph.


nit: I would mention in the comment that it AOTAutograd will perform view-replay to regenerate the view outputs at runtime, setting their grad_fn properly.

Otherwise, it's not clear just from the comment about why it's ok to "ignore" views that require grad when deciding that we can generate an inference graph here.

Thanks, added.

…to enable_grad" Reland of: #128016 Summary from previous PR: We assume only two possible mutually exclusive scenarios: Running compiled region for training (Any of inputs has requires_grad) Produced differentiable outputs should have requires_grad. Running compiled region for inference (None of inputs has requires_grad) All outputs do not have requires_grad. Even if user runs the region under no_grad(), but has an input Tensor with requires_grad - we go Training scenario (1). With current state that means: 1/ needs_autograd should not check torch.is_grad_enabled(), only that any of inputs requires_grad 2/ if needs_autograd => trace_joint (We are in training scenario 1.) => always run compiled region under with.enable_grad() Why was revert? There was a regression of hf_Reformer model on inference. ``` TORCHINDUCTOR_FX_GRAPH_CACHE=0 python benchmarks/dynamo/torchbench.py --performance --inference --bfloat16 --backend inductor --device cuda --only hf_Reformer --cold-start-latency --use-eval-mode ``` Because one of the compiled graphs contained outputs, which are aliases to the inputs that are nn.Parameter(requires_grad=True). Even if inference bencharmsk torchbench runs inside with` torch.no_grad()` - alias (specifically for hf_Reformer - expand) ops preserve requires_grad. As a result we started compiling training graph instead of inference. Fix for view ops: If we have outputs, that are aliases to inputs that requires_grad, those outputs requires grad is not a reason to generate training graph. This is handled in aot_autograd.py, where output_and_mutation_safe are calculated. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

ghstack-source-id: d435aee0988324af359380c6b360ba0b1c8aa10d Pull Request resolved: #128890

bdhirsh · 2024-06-24T19:30:44Z

chatted offline, but failures are because you need to update the expecttests. you can do so with EXPECTTEST_ACCEPT=1

…to enable_grad" Reland of: #128016 Summary from previous PR: We assume only two possible mutually exclusive scenarios: Running compiled region for training (Any of inputs has requires_grad) Produced differentiable outputs should have requires_grad. Running compiled region for inference (None of inputs has requires_grad) All outputs do not have requires_grad. Even if user runs the region under no_grad(), but has an input Tensor with requires_grad - we go Training scenario (1). With current state that means: 1/ needs_autograd should not check torch.is_grad_enabled(), only that any of inputs requires_grad 2/ if needs_autograd => trace_joint (We are in training scenario 1.) => always run compiled region under with.enable_grad() Why was revert? There was a regression of hf_Reformer model on inference. ``` TORCHINDUCTOR_FX_GRAPH_CACHE=0 python benchmarks/dynamo/torchbench.py --performance --inference --bfloat16 --backend inductor --device cuda --only hf_Reformer --cold-start-latency --use-eval-mode ``` Because one of the compiled graphs contained outputs, which are aliases to the inputs that are nn.Parameter(requires_grad=True). Even if inference bencharmsk torchbench runs inside with` torch.no_grad()` - alias (specifically for hf_Reformer - expand) ops preserve requires_grad. As a result we started compiling training graph instead of inference. Fix for view ops: If we have outputs, that are aliases to inputs that requires_grad, those outputs requires grad is not a reason to generate training graph. This is handled in aot_autograd.py, where output_and_mutation_safe are calculated. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

ghstack-source-id: 945f8691c094e78de2c8797a259f6ca5b3259cf8 Pull Request resolved: #128890

…to enable_grad" Reland of: #128016 Summary from previous PR: We assume only two possible mutually exclusive scenarios: Running compiled region for training (Any of inputs has requires_grad) Produced differentiable outputs should have requires_grad. Running compiled region for inference (None of inputs has requires_grad) All outputs do not have requires_grad. Even if user runs the region under no_grad(), but has an input Tensor with requires_grad - we go Training scenario (1). With current state that means: 1/ needs_autograd should not check torch.is_grad_enabled(), only that any of inputs requires_grad 2/ if needs_autograd => trace_joint (We are in training scenario 1.) => always run compiled region under with.enable_grad() Why was revert? There was a regression of hf_Reformer model on inference. ``` TORCHINDUCTOR_FX_GRAPH_CACHE=0 python benchmarks/dynamo/torchbench.py --performance --inference --bfloat16 --backend inductor --device cuda --only hf_Reformer --cold-start-latency --use-eval-mode ``` Because one of the compiled graphs contained outputs, which are aliases to the inputs that are nn.Parameter(requires_grad=True). Even if inference bencharmsk torchbench runs inside with` torch.no_grad()` - alias (specifically for hf_Reformer - expand) ops preserve requires_grad. As a result we started compiling training graph instead of inference. Fix for view ops: If we have outputs, that are aliases to inputs that requires_grad, those outputs requires grad is not a reason to generate training graph. This is handled in aot_autograd.py, where output_and_mutation_safe are calculated. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

ghstack-source-id: 3ecaa4defa64e4917eb821d6645d5988b1ba6270 Pull Request resolved: #128890

zou3519 · 2024-07-18T14:54:17Z

Seems to have broken tests on trunk. Example logs: https://ossci-raw-job-status.s3.amazonaws.com/log/27619471188

zou3519 · 2024-07-18T14:56:09Z

@pytorchbot help

pytorch-bot · 2024-07-18T14:56:12Z

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: argument command: invalid choice: 'help' (choose from 'merge', 'revert', 'rebase', 'label', 'drci', 'cherry-pick', 'close')

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick,close} ...

Try @pytorchbot --help for more info.

zou3519 · 2024-07-18T14:56:36Z

@pytorchbot revert -m "broke trunk tests, probably a landrace" -c landrace

pytorchmergebot · 2024-07-18T14:58:15Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2024-07-18T14:58:28Z

@IvanKobzarev your PR has been successfully reverted.

…enable_grad (#128890)" This reverts commit e98135d. Reverted #128890 on behalf of https://github.com/zou3519 due to broke trunk tests, probably a landrace ([comment](#128890 (comment)))

…to enable_grad" Reland of: #128016 Summary from previous PR: We assume only two possible mutually exclusive scenarios: Running compiled region for training (Any of inputs has requires_grad) Produced differentiable outputs should have requires_grad. Running compiled region for inference (None of inputs has requires_grad) All outputs do not have requires_grad. Even if user runs the region under no_grad(), but has an input Tensor with requires_grad - we go Training scenario (1). With current state that means: 1/ needs_autograd should not check torch.is_grad_enabled(), only that any of inputs requires_grad 2/ if needs_autograd => trace_joint (We are in training scenario 1.) => always run compiled region under with.enable_grad() Changes in partitioner? Inference and Training graphs had difference in return container, list/tuple. The changes in partitioner are done to unify and return always tuple. As a result - some changes in test_aotdispatch.py for graph contents list -> tuple. Why was revert? There was a regression of hf_Reformer model on inference. ``` TORCHINDUCTOR_FX_GRAPH_CACHE=0 python benchmarks/dynamo/torchbench.py --performance --inference --bfloat16 --backend inductor --device cuda --only hf_Reformer --cold-start-latency --use-eval-mode ``` Because one of the compiled graphs contained outputs, which are aliases to the inputs that are nn.Parameter(requires_grad=True). Even if inference bencharmsk torchbench runs inside with` torch.no_grad()` - alias (specifically for hf_Reformer - expand) ops preserve requires_grad. As a result we started compiling training graph instead of inference. Fix for view ops: If we have outputs, that are aliases to inputs that requires_grad, those outputs requires grad is not a reason to generate training graph. This is handled in aot_autograd.py, where output_and_mutation_safe are calculated. cc XilunWu H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse [ghstack-poisoned]

ghstack-source-id: 0c9c536017361d0e14837b7b3abd8ad9b15f8f44 Pull Request resolved: #128890

…rad (pytorch#128890) Reland of: pytorch#128016 Summary from previous PR: We assume only two possible mutually exclusive scenarios: Running compiled region for training (Any of inputs has requires_grad) Produced differentiable outputs should have requires_grad. Running compiled region for inference (None of inputs has requires_grad) All outputs do not have requires_grad. Even if user runs the region under no_grad(), but has an input Tensor with requires_grad - we go Training scenario (1). With current state that means: 1/ needs_autograd should not check torch.is_grad_enabled(), only that any of inputs requires_grad 2/ if needs_autograd => trace_joint (We are in training scenario 1.) => always run compiled region under with.enable_grad() Changes in partitioner? Inference and Training graphs had difference in return container, list/tuple. The changes in partitioner are done to unify and return always tuple. As a result - some changes in test_aotdispatch.py for graph contents list -> tuple. Why was revert? There was a regression of hf_Reformer model on inference. ``` TORCHINDUCTOR_FX_GRAPH_CACHE=0 python benchmarks/dynamo/torchbench.py --performance --inference --bfloat16 --backend inductor --device cuda --only hf_Reformer --cold-start-latency --use-eval-mode ``` Because one of the compiled graphs contained outputs, which are aliases to the inputs that are nn.Parameter(requires_grad=True). Even if inference bencharmsk torchbench runs inside with` torch.no_grad()` - alias (specifically for hf_Reformer - expand) ops preserve requires_grad. As a result we started compiling training graph instead of inference. Fix for view ops: If we have outputs, that are aliases to inputs that requires_grad, those outputs requires grad is not a reason to generate training graph. This is handled in aot_autograd.py, where output_and_mutation_safe are calculated. Pull Request resolved: pytorch#128890 Approved by: https://github.com/bdhirsh

…enable_grad (pytorch#128890)" This reverts commit e98135d. Reverted pytorch#128890 on behalf of https://github.com/zou3519 due to broke trunk tests, probably a landrace ([comment](pytorch#128890 (comment)))

…rad (#128890) Reland of: #128016 Summary from previous PR: We assume only two possible mutually exclusive scenarios: Running compiled region for training (Any of inputs has requires_grad) Produced differentiable outputs should have requires_grad. Running compiled region for inference (None of inputs has requires_grad) All outputs do not have requires_grad. Even if user runs the region under no_grad(), but has an input Tensor with requires_grad - we go Training scenario (1). With current state that means: 1/ needs_autograd should not check torch.is_grad_enabled(), only that any of inputs requires_grad 2/ if needs_autograd => trace_joint (We are in training scenario 1.) => always run compiled region under with.enable_grad() Changes in partitioner? Inference and Training graphs had difference in return container, list/tuple. The changes in partitioner are done to unify and return always tuple. As a result - some changes in test_aotdispatch.py for graph contents list -> tuple. Why was revert? There was a regression of hf_Reformer model on inference. ``` TORCHINDUCTOR_FX_GRAPH_CACHE=0 python benchmarks/dynamo/torchbench.py --performance --inference --bfloat16 --backend inductor --device cuda --only hf_Reformer --cold-start-latency --use-eval-mode ``` Because one of the compiled graphs contained outputs, which are aliases to the inputs that are nn.Parameter(requires_grad=True). Even if inference bencharmsk torchbench runs inside with` torch.no_grad()` - alias (specifically for hf_Reformer - expand) ops preserve requires_grad. As a result we started compiling training graph instead of inference. Fix for view ops: If we have outputs, that are aliases to inputs that requires_grad, those outputs requires grad is not a reason to generate training graph. This is handled in aot_autograd.py, where output_and_mutation_safe are calculated. Pull Request resolved: #128890 Approved by: https://github.com/bdhirsh (cherry picked from commit e98135d)

…enable_grad (#128890)" This reverts commit e98135d. Reverted #128890 on behalf of https://github.com/zou3519 due to broke trunk tests, probably a landrace ([comment](#128890 (comment))) (cherry picked from commit 120fdf7)

…rad (pytorch#128890) Reland of: pytorch#128016 Summary from previous PR: We assume only two possible mutually exclusive scenarios: Running compiled region for training (Any of inputs has requires_grad) Produced differentiable outputs should have requires_grad. Running compiled region for inference (None of inputs has requires_grad) All outputs do not have requires_grad. Even if user runs the region under no_grad(), but has an input Tensor with requires_grad - we go Training scenario (1). With current state that means: 1/ needs_autograd should not check torch.is_grad_enabled(), only that any of inputs requires_grad 2/ if needs_autograd => trace_joint (We are in training scenario 1.) => always run compiled region under with.enable_grad() Changes in partitioner? Inference and Training graphs had difference in return container, list/tuple. The changes in partitioner are done to unify and return always tuple. As a result - some changes in test_aotdispatch.py for graph contents list -> tuple. Why was revert? There was a regression of hf_Reformer model on inference. ``` TORCHINDUCTOR_FX_GRAPH_CACHE=0 python benchmarks/dynamo/torchbench.py --performance --inference --bfloat16 --backend inductor --device cuda --only hf_Reformer --cold-start-latency --use-eval-mode ``` Because one of the compiled graphs contained outputs, which are aliases to the inputs that are nn.Parameter(requires_grad=True). Even if inference bencharmsk torchbench runs inside with` torch.no_grad()` - alias (specifically for hf_Reformer - expand) ops preserve requires_grad. As a result we started compiling training graph instead of inference. Fix for view ops: If we have outputs, that are aliases to inputs that requires_grad, those outputs requires grad is not a reason to generate training graph. This is handled in aot_autograd.py, where output_and_mutation_safe are calculated. Pull Request resolved: pytorch#128890 Approved by: https://github.com/bdhirsh

…enable_grad (pytorch#128890)" This reverts commit 08d5423. Reverted pytorch#128890 on behalf of https://github.com/clee2000 due to broke inductor/test_flex_attention https://github.com/pytorch/pytorch/actions/runs/9879109008/job/27286339304 https://hud.pytorch.org/pytorch/pytorch/commit/08d5423d339ac4b302f8ae6b63b334e032104753 test was not run on PR due to bad TD ([comment](pytorch#128890 (comment)))

…rad (pytorch#128890) Reland of: pytorch#128016 Summary from previous PR: We assume only two possible mutually exclusive scenarios: Running compiled region for training (Any of inputs has requires_grad) Produced differentiable outputs should have requires_grad. Running compiled region for inference (None of inputs has requires_grad) All outputs do not have requires_grad. Even if user runs the region under no_grad(), but has an input Tensor with requires_grad - we go Training scenario (1). With current state that means: 1/ needs_autograd should not check torch.is_grad_enabled(), only that any of inputs requires_grad 2/ if needs_autograd => trace_joint (We are in training scenario 1.) => always run compiled region under with.enable_grad() Changes in partitioner? Inference and Training graphs had difference in return container, list/tuple. The changes in partitioner are done to unify and return always tuple. As a result - some changes in test_aotdispatch.py for graph contents list -> tuple. Why was revert? There was a regression of hf_Reformer model on inference. ``` TORCHINDUCTOR_FX_GRAPH_CACHE=0 python benchmarks/dynamo/torchbench.py --performance --inference --bfloat16 --backend inductor --device cuda --only hf_Reformer --cold-start-latency --use-eval-mode ``` Because one of the compiled graphs contained outputs, which are aliases to the inputs that are nn.Parameter(requires_grad=True). Even if inference bencharmsk torchbench runs inside with` torch.no_grad()` - alias (specifically for hf_Reformer - expand) ops preserve requires_grad. As a result we started compiling training graph instead of inference. Fix for view ops: If we have outputs, that are aliases to inputs that requires_grad, those outputs requires grad is not a reason to generate training graph. This is handled in aot_autograd.py, where output_and_mutation_safe are calculated. Pull Request resolved: pytorch#128890 Approved by: https://github.com/bdhirsh

…enable_grad (pytorch#128890)" This reverts commit e98135d. Reverted pytorch#128890 on behalf of https://github.com/zou3519 due to broke trunk tests, probably a landrace ([comment](pytorch#128890 (comment)))

…to enable_grad" Reland of: #128016 Summary from previous PR: We assume only two possible mutually exclusive scenarios: Running compiled region for training (Any of inputs has requires_grad) Produced differentiable outputs should have requires_grad. Running compiled region for inference (None of inputs has requires_grad) All outputs do not have requires_grad. Even if user runs the region under no_grad(), but has an input Tensor with requires_grad - we go Training scenario (1). With current state that means: 1/ needs_autograd should not check torch.is_grad_enabled(), only that any of inputs requires_grad 2/ if needs_autograd => trace_joint (We are in training scenario 1.) => always run compiled region under with.enable_grad() Changes in partitioner? Inference and Training graphs had difference in return container, list/tuple. The changes in partitioner are done to unify and return always tuple. As a result - some changes in test_aotdispatch.py for graph contents list -> tuple. Why was revert? There was a regression of hf_Reformer model on inference. ``` TORCHINDUCTOR_FX_GRAPH_CACHE=0 python benchmarks/dynamo/torchbench.py --performance --inference --bfloat16 --backend inductor --device cuda --only hf_Reformer --cold-start-latency --use-eval-mode ``` Because one of the compiled graphs contained outputs, which are aliases to the inputs that are nn.Parameter(requires_grad=True). Even if inference bencharmsk torchbench runs inside with` torch.no_grad()` - alias (specifically for hf_Reformer - expand) ops preserve requires_grad. As a result we started compiling training graph instead of inference. Fix for view ops: If we have outputs, that are aliases to inputs that requires_grad, those outputs requires grad is not a reason to generate training graph. This is handled in aot_autograd.py, where output_and_mutation_safe are calculated. cc XilunWu H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang mrshenli pritamdamania87 zhaojuanmao satgera gqchen aazzolini osalpekar jiayisuse [ghstack-poisoned]

ghstack-source-id: 8c1bb666dd096a59f00f85f675f2af509b9ef94b Pull Request resolved: #128890

IvanKobzarev · 2024-07-31T07:18:12Z

@pytorchbot merge

pytorchmergebot · 2024-07-31T07:19:55Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

[aota] Needs autograd if an input requires_grad, agnostic to enable_grad

02348ca

[ghstack-poisoned]

IvanKobzarev requested review from ezyang and Chillee as code owners June 17, 2024 21:00

IvanKobzarev added a commit that referenced this pull request Jun 17, 2024

[aota] Needs autograd if an input requires_grad, agnostic to enable_grad

fd014cc

ghstack-source-id: 35d40650db81040f8eca0476194ffc7f918be565 Pull Request resolved: #128890

pytorch-bot bot added ciflow/inductor module: dynamo release notes: AO frontend labels Jun 17, 2024

Update on "[aota] Needs autograd if an input requires_grad, agnostic …

21eb4b0

…to enable_grad" cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx chenyang78 kadeng chauhang [ghstack-poisoned]

IvanKobzarev added the topic: not user facing topic category label Jun 20, 2024

bdhirsh reviewed Jun 20, 2024

View reviewed changes

bdhirsh requested a review from a team June 20, 2024 15:11

IvanKobzarev added a commit that referenced this pull request Jun 20, 2024

[aota] Needs autograd if an input requires_grad, agnostic to enable_grad

58aad9d

ghstack-source-id: a5a0e9110ea3697c672eeba9bf13959b6b52ba90 Pull Request resolved: #128890

bdhirsh reviewed Jun 20, 2024

View reviewed changes

IvanKobzarev requested a review from bdhirsh June 24, 2024 10:11

bdhirsh approved these changes Jun 24, 2024

View reviewed changes

IvanKobzarev added a commit that referenced this pull request Jun 24, 2024

[aota] Needs autograd if an input requires_grad, agnostic to enable_grad

e6c1a0f

ghstack-source-id: d435aee0988324af359380c6b360ba0b1c8aa10d Pull Request resolved: #128890

IvanKobzarev added a commit that referenced this pull request Jun 24, 2024

[aota] Needs autograd if an input requires_grad, agnostic to enable_grad

94550ba

ghstack-source-id: 945f8691c094e78de2c8797a259f6ca5b3259cf8 Pull Request resolved: #128890

bdhirsh requested a review from a team June 24, 2024 23:15

IvanKobzarev added a commit that referenced this pull request Jul 9, 2024

[aota] Needs autograd if an input requires_grad, agnostic to enable_grad

4c0d6a2

ghstack-source-id: 3ecaa4defa64e4917eb821d6645d5988b1ba6270 Pull Request resolved: #128890

pytorchmergebot reopened this Jul 18, 2024

IvanKobzarev added the ciflow/periodic Trigger jobs ran periodically on master (periodic.yml) on the PR label Jul 18, 2024

IvanKobzarev added a commit that referenced this pull request Jul 18, 2024

[aota] Needs autograd if an input requires_grad, agnostic to enable_grad

b36236b

ghstack-source-id: 0c9c536017361d0e14837b7b3abd8ad9b15f8f44 Pull Request resolved: #128890

IvanKobzarev mentioned this pull request Jul 30, 2024

[AOTAutograd] torch.compile under ambient no_grad is broken #114338

Open

IvanKobzarev added a commit that referenced this pull request Jul 30, 2024

[aota] Needs autograd if an input requires_grad, agnostic to enable_grad

73657ed

ghstack-source-id: 8c1bb666dd096a59f00f85f675f2af509b9ef94b Pull Request resolved: #128890

pytorchmergebot added the merging label Jul 31, 2024

pytorchmergebot closed this in a94e507 Jul 31, 2024

pytorchmergebot removed the merging label Jul 31, 2024

henrylhtsang mentioned this pull request Jul 31, 2024

[BE][typing] fix types in common pruning #132309

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[aota] Needs autograd if an input requires_grad, agnostic to enable_grad #128890

[aota] Needs autograd if an input requires_grad, agnostic to enable_grad #128890

IvanKobzarev commented Jun 17, 2024 •

edited

Loading

pytorch-bot bot commented Jun 17, 2024 •

edited

Loading

ezyang commented Jun 17, 2024

IvanKobzarev commented Jun 20, 2024

bdhirsh Jun 20, 2024

IvanKobzarev Jun 20, 2024

bdhirsh Jun 20, 2024

IvanKobzarev Jun 24, 2024

bdhirsh commented Jun 24, 2024

zou3519 commented Jul 18, 2024

zou3519 commented Jul 18, 2024

pytorch-bot bot commented Jul 18, 2024

zou3519 commented Jul 18, 2024

pytorchmergebot commented Jul 18, 2024

pytorchmergebot commented Jul 18, 2024

IvanKobzarev commented Jul 31, 2024

pytorchmergebot commented Jul 31, 2024

[aota] Needs autograd if an input requires_grad, agnostic to enable_grad #128890

[aota] Needs autograd if an input requires_grad, agnostic to enable_grad #128890

Conversation

IvanKobzarev commented Jun 17, 2024 • edited Loading

pytorch-bot bot commented Jun 17, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128890

✅ You can merge normally! (1 Unrelated Failure)

ezyang commented Jun 17, 2024

IvanKobzarev commented Jun 20, 2024

bdhirsh Jun 20, 2024

Choose a reason for hiding this comment

IvanKobzarev Jun 20, 2024

Choose a reason for hiding this comment

bdhirsh Jun 20, 2024

Choose a reason for hiding this comment

IvanKobzarev Jun 24, 2024

Choose a reason for hiding this comment

bdhirsh commented Jun 24, 2024

zou3519 commented Jul 18, 2024

zou3519 commented Jul 18, 2024

pytorch-bot bot commented Jul 18, 2024

zou3519 commented Jul 18, 2024

pytorchmergebot commented Jul 18, 2024

pytorchmergebot commented Jul 18, 2024

IvanKobzarev commented Jul 31, 2024

pytorchmergebot commented Jul 31, 2024

Merge started

IvanKobzarev commented Jun 17, 2024 •

edited

Loading

pytorch-bot bot commented Jun 17, 2024 •

edited

Loading