Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[reland][ROCm] TunableOp for gemm_and_bias #128919

Closed
wants to merge 11 commits into from

Conversation

jeffdaily
Copy link
Collaborator

@jeffdaily jeffdaily commented Jun 18, 2024

Reland of #128143 but added alpha and bias initialization to launchTunableGemmAndBias

Thus far TunableOp was implemented for gemm, bgemm, and scaled_mm. gemm_and_bias was notably missing. This PR closes that gap.

cc @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @dllehr-amd @jataylo @hongxiayang

@jeffdaily jeffdaily requested a review from eqy as a code owner June 18, 2024 01:53
Copy link

pytorch-bot bot commented Jun 18, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128919

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit 7a44c2f with merge base 1491a61 (image):

NEW FAILURE - The following job has failed:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/rocm module: rocm AMD GPU support for Pytorch labels Jun 18, 2024
@zou3519 zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 21, 2024
@jithunnair-amd jithunnair-amd added the rocm This tag is for PRs from ROCm team label Jul 1, 2024
@jithunnair-amd
Copy link
Collaborator

@xw285cornell Can you please review and approve if this PR looks good?

@diwdoit
Copy link

diwdoit commented Aug 1, 2024

Hi @xw285cornell - appreciate your help to get the PR reviewed.

@jithunnair-amd
Copy link
Collaborator

@malfet I think @xw285cornell is OOO. Can you please approve and merge this PR?

@jithunnair-amd jithunnair-amd added the rocm priority high priority ROCm PRs from performance or other aspects label Aug 7, 2024
@jithunnair-amd
Copy link
Collaborator

@malfet re-ping

@jithunnair-amd jithunnair-amd added the keep-going Don't stop on first failure, keep running tests until the end label Aug 7, 2024
@jithunnair-amd
Copy link
Collaborator

@jeffdaily/@naromero77amd: I see a unit test was added for this PR: test_addmm_relu_tunableop_rocm in test_linalg.py. I have added the keep-going label to this PR to ensure if runs all unit tests, since we have some unrelated failures on ROCm CI that might prevent that unit test from running in CI. Can you please post a snippet showing that the unit tests related to this PR ran successfully in the PR's CI runs?

Comment on lines +5901 to +5916
torch.cuda.tunable.enable(True)
ordinal = torch.cuda.current_device()
filename = f"tunableop_results{ordinal}.csv"
torch.cuda.tunable.set_filename(filename)
iterations = torch.cuda.tunable.get_max_tuning_iterations()
torch.cuda.tunable.set_max_tuning_iterations(10)
self._test_addmm_impl(torch._addmm_activation, "relu", device, dtype)
# clean up, remove any file that was generated
try:
import os
os.remove(filename)
except FileNotFoundError:
pass
# reset back to prior settings
torch.cuda.tunable.set_max_tuning_iterations(iterations)
torch.cuda.tunable.enable(False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, please use try: finally: to avoid altering global state if test fails or gets interrupted

Suggested change
torch.cuda.tunable.enable(True)
ordinal = torch.cuda.current_device()
filename = f"tunableop_results{ordinal}.csv"
torch.cuda.tunable.set_filename(filename)
iterations = torch.cuda.tunable.get_max_tuning_iterations()
torch.cuda.tunable.set_max_tuning_iterations(10)
self._test_addmm_impl(torch._addmm_activation, "relu", device, dtype)
# clean up, remove any file that was generated
try:
import os
os.remove(filename)
except FileNotFoundError:
pass
# reset back to prior settings
torch.cuda.tunable.set_max_tuning_iterations(iterations)
torch.cuda.tunable.enable(False)
iterations = torch.cuda.tunable.get_max_tuning_iterations()
try:
torch.cuda.tunable.enable(True)
ordinal = torch.cuda.current_device()
filename = f"tunableop_results{ordinal}.csv"
torch.cuda.tunable.set_filename(filename)
torch.cuda.tunable.set_max_tuning_iterations(10)
self._test_addmm_impl(torch._addmm_activation, "relu", device, dtype)
finally:
# clean up, remove any file that was generated
try:
import os
os.remove(filename)
except FileNotFoundError:
pass
# reset back to prior settings
torch.cuda.tunable.set_max_tuning_iterations(iterations)
torch.cuda.tunable.enable(False)

return c10::str(transa, transb, "_", m, "_", n, "_", k);
}

size_t GetSize(bool duplicate_inputs) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, I though we are using camelCase for methods and CapitalizedCamelCase for class names

@malfet
Copy link
Contributor

malfet commented Aug 20, 2024

@pytorchbot merge -i

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Aug 20, 2024
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / win-vs2019-cpu-py3 / test (default, 3, 3, windows.4xlarge.nonephemeral)

Details for Dev Infra team Raised by workflow job

@jeffdaily
Copy link
Collaborator Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64 / test (default, 3, 3, macos-m1-stable)

Details for Dev Infra team Raised by workflow job

@jeffdaily
Copy link
Collaborator Author

@pytorchbot merge -f "unrelated macos cpu job failed, all other CI is known flaky or passing"

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ciflow/rocm ciflow/trunk Trigger trunk jobs on your pull request keep-going Don't stop on first failure, keep running tests until the end Merged module: rocm AMD GPU support for Pytorch open source release notes: linalg_frontend release notes category rocm priority high priority ROCm PRs from performance or other aspects rocm This tag is for PRs from ROCm team triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants