Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype for export_for_training #129092

Closed

Conversation

tugsbayasgalan
Copy link
Contributor

@tugsbayasgalan tugsbayasgalan commented Jun 19, 2024

Stack from ghstack (oldest at bottom):

This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy:

  1. Call dynamo to get torch IR
  2. Lift param/buffer
  3. call make_fx

TODO:

  1. run_decomp doesn't work
  2. not-strict is not supported

Differential Revision: D59069087

Copy link

pytorch-bot bot commented Jun 19, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129092

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (21 Unrelated Failures)

As of commit c5f70dd with merge base 7373492 (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@SherlockNoMad
Copy link
Contributor

Is there a way to get Trainining IR with non-strict mode? i.e. no dynamo.


This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy:
1. Call dynamo to get torch IR
2. Lift param/buffer 
3. call make_fx 

TODO:
1. run_decomp doesn't work 
2. not-strict is not supported


[ghstack-poisoned]

This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy:
1. Call dynamo to get torch IR
2. Lift param/buffer 
3. call make_fx 

TODO:
1. run_decomp doesn't work 
2. not-strict is not supported


[ghstack-poisoned]
@tugsbayasgalan
Copy link
Contributor Author

Is there a way to get Trainining IR with non-strict mode? i.e. no dynamo.

There is. But it will be non-trivial work because there are some aot-export-module specific logic that handles buffer re-assignments that @avikchaudhuri worked on. We will need to port them somehow. I think executorch doesn't do non-strict today, so we might be ok with just not imnplementing it for now


This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy:
1. Call dynamo to get torch IR
2. Lift param/buffer 
3. call make_fx 

TODO:
1. run_decomp doesn't work 
2. not-strict is not supported


[ghstack-poisoned]
@@ -1486,6 +1683,281 @@ def forward(self, *args, **kwargs):
)


@_log_export_wrapper
@_disable_prexisiting_fake_mode
def _export_for_training(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation has a lot of duplications with _strict_export.
So I wonder what's the delta? can we reuse code?

@@ -1348,6 +1355,196 @@ def _strict_export(
)


def _export_to_aten_ir_make_fx(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to make sense of the different of this function and '_export_to_aten_ir`

There are quite some duplication, and I am not sure about the delta.


This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy:
1. Call dynamo to get torch IR
2. Lift param/buffer 
3. call make_fx 

TODO:
1. run_decomp doesn't work 
2. not-strict is not supported


[ghstack-poisoned]

This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy:
1. Call dynamo to get torch IR
2. Lift param/buffer 
3. call make_fx 

TODO:
1. run_decomp doesn't work 
2. not-strict is not supported


[ghstack-poisoned]

This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy:
1. Call dynamo to get torch IR
2. Lift param/buffer 
3. call make_fx 

TODO:
1. run_decomp doesn't work 
2. not-strict is not supported


[ghstack-poisoned]

This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy:
1. Call dynamo to get torch IR
2. Lift param/buffer 
3. call make_fx 

TODO:
1. run_decomp doesn't work 
2. not-strict is not supported


[ghstack-poisoned]

This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy:
1. Call dynamo to get torch IR
2. Lift param/buffer 
3. call make_fx 

TODO:
1. run_decomp doesn't work 
2. not-strict is not supported


[ghstack-poisoned]

This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy:
1. Call dynamo to get torch IR
2. Lift param/buffer 
3. call make_fx 

TODO:
1. run_decomp doesn't work 
2. not-strict is not supported


[ghstack-poisoned]

This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy:
1. Call dynamo to get torch IR
2. Lift param/buffer 
3. call make_fx 

TODO:
1. run_decomp doesn't work 
2. not-strict is not supported


[ghstack-poisoned]

This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy:
1. Call dynamo to get torch IR
2. Lift param/buffer 
3. call make_fx 

TODO:
1. run_decomp doesn't work 
2. not-strict is not supported


[ghstack-poisoned]

This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy:
1. Call dynamo to get torch IR
2. Lift param/buffer 
3. call make_fx 

TODO:
1. run_decomp doesn't work 
2. not-strict is not supported


[ghstack-poisoned]
@tugsbayasgalan
Copy link
Contributor Author

@tugsbayasgalan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

1 similar comment
@tugsbayasgalan
Copy link
Contributor Author

@tugsbayasgalan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

torch/export/_trace.py Outdated Show resolved Hide resolved

This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy:
1. Call dynamo to get torch IR
2. Lift param/buffer 
3. call make_fx 

TODO:
1. run_decomp doesn't work 
2. not-strict is not supported

Differential Revision: [D59069087](https://our.internmc.facebook.com/intern/diff/D59069087)

[ghstack-poisoned]
@tugsbayasgalan
Copy link
Contributor Author

@tugsbayasgalan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 27, 2024
@facebook-github-bot
Copy link
Contributor

@pytorchbot merge -f 'Landed internally'

(Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Jun 27, 2024
In this PR, we implement the first version of training_ir.run_decomp functionality. Since we don't return the modified buffers as extra output in training IR, our previous strategy of reusing graph signature won't work. In fact, this run_decomp is more similar to retracing. So i reuse some of export steps here. After this PR:
export_for_training().run_decomp({}, _preserve_ops=[all 183 ops]) == export_for_predispatch() - autograd_manipulating_ops.

Differential Revision: [D59069090](https://our.internmc.facebook.com/intern/diff/D59069090)
Pull Request resolved: #129249
Approved by: https://github.com/zhxchen17
ghstack dependencies: #128077, #129092
pytorchmergebot pushed a commit that referenced this pull request Jun 27, 2024
@github-actions github-actions bot deleted the gh/tugsbayasgalan/223/head branch July 28, 2024 02:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants