Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add variable seqlen and sparsity parameters to jagged_sum benchmark #2324

Closed
wants to merge 1 commit into from

Conversation

jananisriram
Copy link
Contributor

@jananisriram jananisriram commented Jun 20, 2024

Summary:
Modify existing jagged_sum operator benchmark to optionally accept any of the following parameters: B (dimension 0 of nested tensor), M (dimension 2 of nested tensor), seqlen (maximum sequence length on ragged dimension), or sparsity (average sparsity on ragged dimension). This diff fixes the provided command line parameters and varies all other parameters above, enabling testing of all combinations of multiple parameters in parallel.

The following errors persist with sufficiently large inputs:

  • RuntimeError: numel needs to be smaller than int32_t max; otherwise, please use packed_accessor64 (when running command buck2 run @mode/{opt,inplace} //pytorch/benchmark:triton -- --op jagged_sum --B 1024 --M 1024 --sparsity 0.3)
  • torch.OutOfMemoryError: CUDA out of memory.

Reviewed By: davidberard98

Differential Revision: D58772201

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D58772201

Summary:
Modify existing `jagged_sum` operator benchmark to optionally accept any of the following parameters: `B` (dimension 0 of nested tensor), `M` (dimension 2 of nested tensor), `seqlen` (maximum sequence length on ragged dimension), or `sparsity` (average sparsity on ragged dimension). This diff fixes the provided command line parameters and varies all other parameters above, enabling testing of all combinations of multiple parameters in parallel.

The following errors persist with sufficiently large inputs:
- `RuntimeError: numel needs to be smaller than int32_t max; otherwise, please use packed_accessor64` (when running command `buck2 run mode/{opt,inplace} //pytorch/benchmark:triton -- --op jagged_sum --B 1024 --M 1024 --sparsity 0.3`)
- `torch.OutOfMemoryError: CUDA out of memory.`

Reviewed By: davidberard98

Differential Revision: D58772201
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D58772201

@facebook-github-bot
Copy link
Contributor

This pull request has been merged in 1425f68.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants