Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend support to varying block sizes on both dimensions for 2D matrices #2302

Closed
wants to merge 1 commit into from

Conversation

jananisriram
Copy link
Contributor

Summary:
Extend support for reducing across individual dimensions on 2-dimensional matrices by allowing for varying block sizes on both the M (first) and N (second) dimensions.

The existing kernel performed a simplified reduction, assuming that the entire reduction dimension fit within one thread block. The new kernel implementation removes the need for this assumption, allowing both the reduction and the non-reduction dimensions to fit in multiple thread blocks. This implementation also enables autotuning on block sizes for both the M and N dimensions.

For 1D results, add a sum_then_buffer configuration which decides which kernel configuration to run. Sum_then_buffer sums individual blocks of input and adds these sums into a buffer. Buffer_then_sum adds blocks of raw input into a buffer, then reduces the buffer.

Reviewed By: davidberard98

Differential Revision: D58313958

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D58313958

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D58313958

jananisriram added a commit to jananisriram/benchmark that referenced this pull request Jun 14, 2024
…ces (pytorch#2302)

Summary:
Pull Request resolved: pytorch#2302

Extend support for reducing across individual dimensions on 2-dimensional matrices by allowing for varying block sizes on both the `M` (first) and `N` (second) dimensions.

The existing kernel performed a simplified reduction, assuming that the entire reduction dimension fit within one thread block. The new kernel implementation removes the need for this assumption, allowing both the reduction and the non-reduction dimensions to fit in multiple thread blocks. This implementation also enables autotuning on block sizes for both the `M` and `N` dimensions.

For 1D results, add a `sum_then_buffer` configuration which decides which kernel configuration to run. `Sum_then_buffer` sums individual blocks of input and adds these sums into a buffer. `Buffer_then_sum` adds blocks of raw input into a buffer, then reduces the buffer.

Reviewed By: davidberard98

Differential Revision: D58313958
…ces (pytorch#2302)

Summary:
Pull Request resolved: pytorch#2302

Extend support for reducing across individual dimensions on 2-dimensional matrices by allowing for varying block sizes on both the `M` (first) and `N` (second) dimensions.

The existing kernel performed a simplified reduction, assuming that the entire reduction dimension fit within one thread block. The new kernel implementation removes the need for this assumption, allowing both the reduction and the non-reduction dimensions to fit in multiple thread blocks. This implementation also enables autotuning on block sizes for both the `M` and `N` dimensions.

For 1D results, add a `sum_then_buffer` configuration which decides which kernel configuration to run. `Sum_then_buffer` sums individual blocks of input and adds these sums into a buffer. `Buffer_then_sum` adds blocks of raw input into a buffer, then reduces the buffer.

Reviewed By: davidberard98

Differential Revision: D58313958
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D58313958

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in f4cbf78.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants