Extend support to varying block sizes on both dimensions for 2D matrices #2302
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
Extend support for reducing across individual dimensions on 2-dimensional matrices by allowing for varying block sizes on both the
M
(first) andN
(second) dimensions.The existing kernel performed a simplified reduction, assuming that the entire reduction dimension fit within one thread block. The new kernel implementation removes the need for this assumption, allowing both the reduction and the non-reduction dimensions to fit in multiple thread blocks. This implementation also enables autotuning on block sizes for both the
M
andN
dimensions.For 1D results, add a
sum_then_buffer
configuration which decides which kernel configuration to run.Sum_then_buffer
sums individual blocks of input and adds these sums into a buffer.Buffer_then_sum
adds blocks of raw input into a buffer, then reduces the buffer.Reviewed By: davidberard98
Differential Revision: D58313958