-
Notifications
You must be signed in to change notification settings - Fork 22k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[inductor] refine loop split logic #128812
base: gh/zhuhaozhe/39/base
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128812
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit 685e7d1 with merge base 32f45f0 (): NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ghstack-source-id: a0ffb42b1c0b2159b72f278aa4184ab75325cd03 Pull Request resolved: #128812
ghstack-source-id: a0ffb42b1c0b2159b72f278aa4184ab75325cd03 Pull Request resolved: pytorch#128812
ghstack-source-id: a0ffb42b1c0b2159b72f278aa4184ab75325cd03 Pull Request resolved: pytorch#128812
ghstack-source-id: ae8e67d681d811c0cd0ed703d186ddbe8e39f854 Pull Request resolved: #128812
ghstack-source-id: ae8e67d681d811c0cd0ed703d186ddbe8e39f854 Pull Request resolved: pytorch#128812
ghstack-source-id: ae8e67d681d811c0cd0ed703d186ddbe8e39f854 Pull Request resolved: pytorch#128812
ghstack-source-id: ff1dcca4bbb2cf3100f86bf622b492f73df3ad16 Pull Request resolved: #128812
ghstack-source-id: 39d237a5cf04be275029125ef488469b2f430dda Pull Request resolved: #128812
ghstack-source-id: 6baf7b0426bbcc1ea0c06180b393ecb4619bb59d Pull Request resolved: #128812
ghstack-source-id: 8254f219519f68724f941713938b04d9d44c53ac Pull Request resolved: #128812
ghstack-source-id: 470238141e894f1cd0ea1c798987c229020dccf4 Pull Request resolved: #128812
torch/_inductor/codegen/cpp.py
Outdated
assert deepest_proxy is not None | ||
return deepest_proxy | ||
|
||
deepest_proxy = find_deepest_proxy(cpp_kernel_proxy_list) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we have to find the deepest kernel proxy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do not need it now since we will let the LoopNest
in OuterFusedKernel
start from depth 0 and we do not need fusion depth
.
Previously we will loss the LoopLevel to gen if we do not choose the deepest kernel proxy here.
ghstack-source-id: 7c3963eca96d94f8708064acff585d141e097332 Pull Request resolved: #128812
ghstack-source-id: 1c26422a26460a6d862fbdd8bde1a5401b950b01 Pull Request resolved: #128812
ghstack-source-id: d3a393e324eb6c991988876f7a030bb502b6c8c2 Pull Request resolved: #128812
ghstack-source-id: 7ad611d5f734ea372a5b151fb838e1c870bd2965 Pull Request resolved: #128812
ghstack-source-id: c1813f3d77fd592337afdd5680fa81855a8af8d5 Pull Request resolved: #128812
ghstack-source-id: 75e5bb00666a71450e4fd3f23238f3d67258194d Pull Request resolved: #128812
ghstack-source-id: 194ee307738c834fa2c1a54a19a2ae32ffcd35c6 Pull Request resolved: #128812
ghstack-source-id: 037af8d2d6965266a54d4be8c7e50296cd4f6422 Pull Request resolved: #128812
ghstack-source-id: a6eb457e0c029cf912fc404d6246dfb58a747c7d Pull Request resolved: #128812
This PR aims to improves parallelization by collapsing vectorized loop. #122281
For such case, the parallel level is only
2
.And the vectorized loop cannot be collapsed.
After this PR, we will gen code
Highlight
For reduction case, we have some side-effect here.
For below case, we vectorized
x1
dim and reduction atx2
dim.After collapse, the loop order will be
x1 -> x2 -> x1_tail_part
, thus we will need atmp_acc_arr
to store the reduction result forx1_tail_part
. And forreduction_stores
, we also need to checkx1
's value like what we do in theloopbody
since thereduction_stores
happened betweenx1
andx2
loops.Stack from ghstack (oldest at bottom):
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang