Prototype for export_for_training #129092

tugsbayasgalan · 2024-06-19T21:44:50Z

Stack from ghstack (oldest at bottom):

This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy:

Call dynamo to get torch IR
Lift param/buffer
call make_fx

TODO:

run_decomp doesn't work
not-strict is not supported

Differential Revision: D59069087

[ghstack-poisoned]

pytorch-bot · 2024-06-19T21:44:53Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129092

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (21 Unrelated Failures)

As of commit c5f70dd with merge base 7373492 ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / cuda12.1-py3.10-gcc9-sm86 / test (aot_inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
functorch_dp_cifar10
inductor / cuda12.1-py3.10-gcc9-sm86 / test (aot_inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
vision_maskrcnn
inductor / cuda12.1-py3.10-gcc9-sm86 / test (dynamic_inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
lennard_jones
inductor / cuda12.1-py3.10-gcc9-sm86 / test (dynamic_inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
vision_maskrcnn
inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
lennard_jones
inductor / cuda12.1-py3.10-gcc9-sm86 / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
vision_maskrcnn
inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (cpu_inductor_torchbench_amp_freezing, 1, 2, linux.16xlarge.spr) (gh) (trunk failure)
lennard_jones
inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (cpu_inductor_torchbench_amp_freezing, 2, 2, linux.16xlarge.spr) (gh) (trunk failure)
vision_maskrcnn
inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (cpu_inductor_torchbench_freezing, 1, 2, linux.12xlarge) (gh) (trunk failure)
lennard_jones
inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (cpu_inductor_torchbench_freezing, 2, 2, linux.12xlarge) (gh) (trunk failure)
vision_maskrcnn
inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (cpu_inductor_torchbench, 1, 2, linux.12xlarge) (gh) (trunk failure)
lennard_jones
inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (cpu_inductor_torchbench, 2, 2, linux.12xlarge) (gh) (trunk failure)
vision_maskrcnn
inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (dynamic_cpu_inductor_torchbench, 1, 2, linux.12xlarge) (gh) (trunk failure)
lennard_jones
inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (dynamic_cpu_inductor_torchbench, 2, 2, linux.12xlarge) (gh) (trunk failure)
vision_maskrcnn
inductor / linux-jammy-cpu-py3.8-gcc11-inductor / test (inductor_torchbench_cpu_smoketest_perf, 1, 1, linux.24xl.spr-metal) (gh) (trunk failure)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

SherlockNoMad · 2024-06-20T17:35:12Z

Is there a way to get Trainining IR with non-strict mode? i.e. no dynamo.

This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy: 1. Call dynamo to get torch IR 2. Lift param/buffer 3. call make_fx TODO: 1. run_decomp doesn't work 2. not-strict is not supported [ghstack-poisoned]

tugsbayasgalan · 2024-06-20T20:03:47Z

Is there a way to get Trainining IR with non-strict mode? i.e. no dynamo.

There is. But it will be non-trivial work because there are some aot-export-module specific logic that handles buffer re-assignments that @avikchaudhuri worked on. We will need to port them somehow. I think executorch doesn't do non-strict today, so we might be ok with just not imnplementing it for now

This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy: 1. Call dynamo to get torch IR 2. Lift param/buffer 3. call make_fx TODO: 1. run_decomp doesn't work 2. not-strict is not supported [ghstack-poisoned]

SherlockNoMad · 2024-06-21T16:13:27Z

torch/export/_trace.py

@@ -1486,6 +1683,281 @@ def forward(self, *args, **kwargs):
 )


+@_log_export_wrapper
+@_disable_prexisiting_fake_mode
+def _export_for_training(


This implementation has a lot of duplications with _strict_export.
So I wonder what's the delta? can we reuse code?

SherlockNoMad · 2024-06-21T16:19:08Z

torch/export/_trace.py

@@ -1348,6 +1355,196 @@ def _strict_export(
 )


+def _export_to_aten_ir_make_fx(


I am trying to make sense of the different of this function and '_export_to_aten_ir`

There are quite some duplication, and I am not sure about the delta.

This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy: 1. Call dynamo to get torch IR 2. Lift param/buffer 3. call make_fx TODO: 1. run_decomp doesn't work 2. not-strict is not supported [ghstack-poisoned]

torch/export/_trace.py

This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy: 1. Call dynamo to get torch IR 2. Lift param/buffer 3. call make_fx TODO: 1. run_decomp doesn't work 2. not-strict is not supported [ghstack-poisoned]

tugsbayasgalan · 2024-06-26T18:07:43Z

@tugsbayasgalan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

tugsbayasgalan · 2024-06-26T18:14:38Z

@tugsbayasgalan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

torch/export/_trace.py

This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy: 1. Call dynamo to get torch IR 2. Lift param/buffer 3. call make_fx TODO: 1. run_decomp doesn't work 2. not-strict is not supported Differential Revision: [D59069087](https://our.internmc.facebook.com/intern/diff/D59069087) [ghstack-poisoned]

tugsbayasgalan · 2024-06-26T19:02:04Z

@tugsbayasgalan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-06-27T18:25:31Z

@pytorchbot merge -f 'Landed internally'

(Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally)

pytorchmergebot · 2024-06-27T18:26:58Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

In this PR, we implement the first version of training_ir.run_decomp functionality. Since we don't return the modified buffers as extra output in training IR, our previous strategy of reusing graph signature won't work. In fact, this run_decomp is more similar to retracing. So i reuse some of export steps here. After this PR: export_for_training().run_decomp({}, _preserve_ops=[all 183 ops]) == export_for_predispatch() - autograd_manipulating_ops. Differential Revision: [D59069090](https://our.internmc.facebook.com/intern/diff/D59069090) Pull Request resolved: #129249 Approved by: https://github.com/zhxchen17 ghstack dependencies: #128077, #129092

Differential Revision: [D59069088](https://our.internmc.facebook.com/intern/diff/D59069088) Pull Request resolved: #129547 Approved by: https://github.com/avikchaudhuri ghstack dependencies: #128077, #129092, #129249

Prototype for export_for_training

14f8fc4

[ghstack-poisoned]

tugsbayasgalan requested review from avikchaudhuri, gmagogsfm and zhxchen17 as code owners June 19, 2024 21:44

tugsbayasgalan mentioned this pull request Jun 18, 2024

Don't decompose functional composite ops in export inference IR #128077

Closed

Update on "Prototype for export_for_training"

62dd7a3

[ghstack-poisoned]

pytorch-bot bot added the ciflow/inductor label Jun 20, 2024

SherlockNoMad reviewed Jun 21, 2024

View reviewed changes

tugsbayasgalan mentioned this pull request Jun 21, 2024

Make run_decomp work #129249

Closed

tugsbayasgalan added 3 commits June 21, 2024 13:16

zhxchen17 reviewed Jun 24, 2024

View reviewed changes

torch/export/_trace.py Show resolved Hide resolved

tugsbayasgalan added 4 commits June 24, 2024 17:34

tugsbayasgalan requested a review from zhxchen17 June 25, 2024 04:53

tugsbayasgalan mentioned this pull request Jun 26, 2024

Taskify training IR + run_decomp flow failures #129547

Closed

zhxchen17 reviewed Jun 26, 2024

View reviewed changes

torch/export/_trace.py Outdated Show resolved Hide resolved

tugsbayasgalan requested a review from zhxchen17 June 26, 2024 19:04

zhxchen17 approved these changes Jun 27, 2024

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 27, 2024

pytorchmergebot added the merging label Jun 27, 2024

pytorchmergebot added the Merged label Jun 27, 2024

pytorchmergebot closed this in ec284d3 Jun 27, 2024

pytorchmergebot removed the merging label Jun 27, 2024

github-actions bot deleted the gh/tugsbayasgalan/223/head branch July 28, 2024 02:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prototype for export_for_training #129092

Prototype for export_for_training #129092

tugsbayasgalan commented Jun 19, 2024 •

edited

Loading

pytorch-bot bot commented Jun 19, 2024 •

edited

Loading

SherlockNoMad commented Jun 20, 2024

tugsbayasgalan commented Jun 20, 2024

SherlockNoMad Jun 21, 2024

SherlockNoMad Jun 21, 2024

tugsbayasgalan commented Jun 26, 2024

tugsbayasgalan commented Jun 26, 2024

tugsbayasgalan commented Jun 26, 2024

facebook-github-bot commented Jun 27, 2024

pytorchmergebot commented Jun 27, 2024

		@@ -1348,6 +1355,196 @@ def _strict_export(
		)


		def _export_to_aten_ir_make_fx(

Prototype for export_for_training #129092

Prototype for export_for_training #129092

Conversation

tugsbayasgalan commented Jun 19, 2024 • edited Loading

pytorch-bot bot commented Jun 19, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129092

✅ You can merge normally! (21 Unrelated Failures)

SherlockNoMad commented Jun 20, 2024

tugsbayasgalan commented Jun 20, 2024

SherlockNoMad Jun 21, 2024

Choose a reason for hiding this comment

SherlockNoMad Jun 21, 2024

Choose a reason for hiding this comment

tugsbayasgalan commented Jun 26, 2024

tugsbayasgalan commented Jun 26, 2024

tugsbayasgalan commented Jun 26, 2024

facebook-github-bot commented Jun 27, 2024

pytorchmergebot commented Jun 27, 2024

Merge started

tugsbayasgalan commented Jun 19, 2024 •

edited

Loading

pytorch-bot bot commented Jun 19, 2024 •

edited

Loading