-
Notifications
You must be signed in to change notification settings - Fork 519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add reduction tutorial #3513
base: docs/develop
Are you sure you want to change the base?
Add reduction tutorial #3513
Conversation
7330c2e
to
1e6f150
Compare
7a2c617
to
ed9e01e
Compare
5e1ee17
to
33401d5
Compare
0c1df1c
to
69a43be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@neon60 I have responded to all your comments. Let me know if I missed something,
Two-pass reduction | ||
------------------ | ||
|
||
Alter kernel launch and input fetching such that no more blocks are launched than what a subsequent kernel launch's single block can conveniently reduce, while performing multiple passes of input reading from global and combining their results before engaging in the end game tree-like reduction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something is off in this sentence. I can't quite understand it. Difficult to comment.
|
||
This modification can only be executed on AMD hardware. | ||
|
||
Perform the first step of the two-pass reduction, but in the end, instead of writing to global and reading it back in a subsequent kernel, write the partial results to the Global Data Share (GDS). This is an ``N+1`` th shared memory that is accessed by all multiprocessors and is also on-chip memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perform the first step of the two-pass reduction, but in the end, instead of writing to global and reading it back in a subsequent kernel, write the partial results to the Global Data Share (GDS). This is an ``N+1`` th shared memory that is accessed by all multiprocessors and is also on-chip memory. | |
Perform the first step of the two-pass reduction, but in the end, instead of writing to global and reading it back in a subsequent kernel, write the partial results to the Global Data Share (GDS). This is an ``N+1`` **th** shared memory that is accessed by all multiprocessors and is also on-chip memory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the th?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would not make it bold. The problem: https://www.reddit.com/r/math/comments/nw5lb1/n1st_or_n1th/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's better to show the numbers in the squares on the third row ? Instead of f(z, 5) it should be 5 and f(13, 8) should be 13
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gandryey I will be able to check this tonight. If you have more comments, please share.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we show the numbers in the squares, since the full function is already written on the previous step? f(z,5) should be 5 and f(f(z,5),13) should be just 13.
b213598
to
5b2f633
Compare
Co-authored-by: Leo Paoletti <[email protected]>
Co-authored-by: Leo Paoletti <[email protected]>
Co-authored-by: Leo Paoletti <[email protected]>
5b2f633
to
9707ba4
Compare
Add reduction tutorial