-
Notifications
You must be signed in to change notification settings - Fork 21.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fast path detach()/alias() in FakeTensor #128281
Comments
Actionable to attempt the following approaches:
|
A few findings. I used (1) I tried fast-pathing detach to avoid the decomps (only step 1 above, not step 2) by adding a fastpath for FakeTensor.detach(), by temporarily turning off the python dispatcher , and did not see much of an overall speedup. (2) Looking at the svg, you can see that the majority (~2/3) of the calls to TensorBase::detach() (3) I updated snapshot_fake() to directly call the same fast_detach() (with no decomps), and I see a much larger speedup.
compile time before: 23.820 |
Fixes #128281, see investigation at #128281 (comment). benchmark: ``` python benchmarks/dynamo/huggingface.py --performance --timing --explain --backend aot_eager --device cuda --training --float32 --only BertForMaskedLM ``` time before: ``` TIMING: entire_frame_compile:30.85435 backend_compile:23.98599 total_wall_time:30.85435 ``` time after: ``` TIMING: entire_frame_compile:24.35898 backend_compile:18.15235 total_wall_time:24.35898 ``` [ghstack-poisoned]
🐛 Describe the bug
We call detach()/alias() for a variety of administrative purposes, typically because we need to get a copy of the metadata of a tensor that won't be modified by subsequent metadata mutation. This is currently implemented quite inefficiently:
We should have a fastpath for this which bypasses performing a view() on it.
High priority for compile time improvements.
Versions
main
cc @gchanan @zou3519 @kadeng @msaroufim @bdhirsh @anijain2305 @chauhang @eellison
The text was updated successfully, but these errors were encountered: