-
Notifications
You must be signed in to change notification settings - Fork 21.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Inductor] support masked vectorization for the tail_loop #126526
base: gh/jiayisunx/10/base
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126526
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 3 Unrelated FailuresAs of commit f01286e with merge base ae708e9 (): NEW FAILURE - The following job has failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
UNSTABLE - The following job failed but was likely due to flakiness present on trunk and has been marked as unstable:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
ghstack-source-id: 4e2aa6dfafd14ce90a1dc5b91cb4cbd59ad35628 Pull Request resolved: #126526
ghstack-source-id: 6d8d821a710fe38916d82149fd6a4947d33c5447 Pull Request resolved: #126526
ghstack-source-id: d468379dceab2966884ac812b9a64817151e9002 Pull Request resolved: #126526
Hi @jgong5 , E made some changes to the welford reduction part to fix performance regression, could you please help review it again? thanks! |
@jiayisunx @CaoE Thanks. Do you mind share the generated code for this? |
Sure, I updated the generated code in the top comment. |
ghstack-source-id: de3061b3de1ad8e7032111cff88ab43af9aba0af Pull Request resolved: pytorch#126526
@@ -2146,6 +2142,8 @@ def __init__( | |||
tiling_factor = self.vec_isa.nelements(dtype=tiling_dtype) | |||
self.tiling_factor = tiling_factor | |||
self.tiling_idx = tiling_idx | |||
self.tail_size = tail_size | |||
self.num_elems = tail_size if tail_size else tiling_factor |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do we handle the case when self.tail_size
is symbolic? This seems problematic when we use self.tail_size
as constant in the code like in GCC unroll
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked has_free_symbols
in CppVecKernelChecker
, for dynamic shapes, the tail loop still uses scalar kernel.
@@ -2828,6 +2942,9 @@ def store(self, name, index, value, mode=None): | |||
return self.simd_vec | |||
|
|||
def reduction(self, dtype, src_dtype, reduction_type, value): | |||
if has_free_symbols(self.ranges): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Are you going to enable this for dynamic shapes in future PRs? I don't think it is very hard to do so?
@jansel , could you please review this PR? Thanks! |
Stack from ghstack (oldest at bottom):
Currently the tail_loop always uses the scalar kernel. This PR supports masked vectorization for the tail_loop to improve the performance.
Example:
Generated code:
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang