-
Notifications
You must be signed in to change notification settings - Fork 21.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Prototype for export_for_training #129092
Prototype for export_for_training #129092
Conversation
[ghstack-poisoned]
[ghstack-poisoned]
Is there a way to get Trainining IR with non-strict mode? i.e. no dynamo. |
This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy: 1. Call dynamo to get torch IR 2. Lift param/buffer 3. call make_fx TODO: 1. run_decomp doesn't work 2. not-strict is not supported [ghstack-poisoned]
This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy: 1. Call dynamo to get torch IR 2. Lift param/buffer 3. call make_fx TODO: 1. run_decomp doesn't work 2. not-strict is not supported [ghstack-poisoned]
There is. But it will be non-trivial work because there are some aot-export-module specific logic that handles buffer re-assignments that @avikchaudhuri worked on. We will need to port them somehow. I think executorch doesn't do non-strict today, so we might be ok with just not imnplementing it for now |
This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy: 1. Call dynamo to get torch IR 2. Lift param/buffer 3. call make_fx TODO: 1. run_decomp doesn't work 2. not-strict is not supported [ghstack-poisoned]
@@ -1486,6 +1683,281 @@ def forward(self, *args, **kwargs): | |||
) | |||
|
|||
|
|||
@_log_export_wrapper | |||
@_disable_prexisiting_fake_mode | |||
def _export_for_training( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation has a lot of duplications with _strict_export
.
So I wonder what's the delta? can we reuse code?
@@ -1348,6 +1355,196 @@ def _strict_export( | |||
) | |||
|
|||
|
|||
def _export_to_aten_ir_make_fx( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am trying to make sense of the different of this function and '_export_to_aten_ir`
There are quite some duplication, and I am not sure about the delta.
This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy: 1. Call dynamo to get torch IR 2. Lift param/buffer 3. call make_fx TODO: 1. run_decomp doesn't work 2. not-strict is not supported [ghstack-poisoned]
This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy: 1. Call dynamo to get torch IR 2. Lift param/buffer 3. call make_fx TODO: 1. run_decomp doesn't work 2. not-strict is not supported [ghstack-poisoned]
This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy: 1. Call dynamo to get torch IR 2. Lift param/buffer 3. call make_fx TODO: 1. run_decomp doesn't work 2. not-strict is not supported [ghstack-poisoned]
This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy: 1. Call dynamo to get torch IR 2. Lift param/buffer 3. call make_fx TODO: 1. run_decomp doesn't work 2. not-strict is not supported [ghstack-poisoned]
This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy: 1. Call dynamo to get torch IR 2. Lift param/buffer 3. call make_fx TODO: 1. run_decomp doesn't work 2. not-strict is not supported [ghstack-poisoned]
This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy: 1. Call dynamo to get torch IR 2. Lift param/buffer 3. call make_fx TODO: 1. run_decomp doesn't work 2. not-strict is not supported [ghstack-poisoned]
This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy: 1. Call dynamo to get torch IR 2. Lift param/buffer 3. call make_fx TODO: 1. run_decomp doesn't work 2. not-strict is not supported [ghstack-poisoned]
This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy: 1. Call dynamo to get torch IR 2. Lift param/buffer 3. call make_fx TODO: 1. run_decomp doesn't work 2. not-strict is not supported [ghstack-poisoned]
This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy: 1. Call dynamo to get torch IR 2. Lift param/buffer 3. call make_fx TODO: 1. run_decomp doesn't work 2. not-strict is not supported [ghstack-poisoned]
@tugsbayasgalan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@tugsbayasgalan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy: 1. Call dynamo to get torch IR 2. Lift param/buffer 3. call make_fx TODO: 1. run_decomp doesn't work 2. not-strict is not supported Differential Revision: [D59069087](https://our.internmc.facebook.com/intern/diff/D59069087) [ghstack-poisoned]
@tugsbayasgalan has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@pytorchbot merge -f 'Landed internally' (Initiating merge automatically since Phabricator Diff has merged, using force because this PR might not pass merge_rules.json but landed internally) |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
In this PR, we implement the first version of training_ir.run_decomp functionality. Since we don't return the modified buffers as extra output in training IR, our previous strategy of reusing graph signature won't work. In fact, this run_decomp is more similar to retracing. So i reuse some of export steps here. After this PR: export_for_training().run_decomp({}, _preserve_ops=[all 183 ops]) == export_for_predispatch() - autograd_manipulating_ops. Differential Revision: [D59069090](https://our.internmc.facebook.com/intern/diff/D59069090) Pull Request resolved: #129249 Approved by: https://github.com/zhxchen17 ghstack dependencies: #128077, #129092
Differential Revision: [D59069088](https://our.internmc.facebook.com/intern/diff/D59069088) Pull Request resolved: #129547 Approved by: https://github.com/avikchaudhuri ghstack dependencies: #128077, #129092, #129249
Stack from ghstack (oldest at bottom):
This PR implements export_for_training where the IR is not-functional, pre-dispatch aten IR. The general strategy:
TODO:
Differential Revision: D59069087