sdp::SDPBackend::flash_attention support PrivateUse1 #126392

1274085042 · 2024-05-16T12:24:38Z

Fixes #124271

cc @cpuhrsch @drisspg @albanD @soulitzer

pytorch-bot · 2024-05-16T12:24:41Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126392

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9b7f54b with merge base a0dac3d ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

aten/src/ATen/native/native_functions.yaml

aten/src/ATen/native/transformers/attention.cpp

drisspg · 2024-05-22T00:13:16Z

The current structure of this op looks like;

|-- Determine backend (CUDA, CPU, HIP, PrivateUse1)
|    |
|    |-- if PrivateUse1:
|    |      |-- handle_private_use(...)
|    |-- else:
|          |-- _fused_sdp_choice_stub(...)
|
|-- switch (backend)
     |
     |-- case cudnn_attention:
     |      |-- out_lse_softmax = at::_scaled_dot_product_cudnn_attention(...)
     |
     |-- case flash_attention:
     |      |-- if CUDA:
     |      |      |-- out_lse_softmax = at::_scaled_dot_product_flash_attention(...)
     |      |-- else (CPU):
     |            |-- return at::_scaled_dot_product_flash_attention_for_cpu(...)
     |
     |-- case efficient_attention:
     |      |-- out_and_lse = at::_scaled_dot_product_efficient_attention(...)
     |
     |-- case math:
     |      |-- return at::_scaled_dot_product_attention_math(...)
     |
     |-- default:
            |-- TORCH_CHECK(false, "No viable backend found.")
            |-- return Tensor()

I spoke with Alban offline about this, and we came to the conclusion that we want this structure:

|-- Determine backend (CUDA, CPU, HIP, PrivateUse1)
|    | If stub_registered(){
|    | 		|--_fused_sdp_choice_stub(...)
|	 | Else
|.   | Use math as choice
|
|-- switch (backend)
     |
     |-- case cudnn_attention:
     |      |-- out_lse_softmax = at::_scaled_dot_product_cudnn_attention(...)
     |
     |-- case flash_attention:
     |      |-- if CUDA:
     |      |      |-- out_lse_softmax = at::_scaled_dot_product_flash_attention(...)
     |      |-- else (CPU):
     |            |-- return at::_scaled_dot_product_flash_attention_for_cpu(...)
     |
     |-- case efficient_attention:
     |      |-- out_and_lse = at::_scaled_dot_product_efficient_attention(...)
     |
	 |-- case overridable:
	 		|-- return at::_scaled_dot_product_attention_overridable(...)
	 }
     |-- case math:
     |      |-- return at::_scaled_dot_product_attention_math(...)
     |
	 |
     |-- default:
            |-- TORCH_CHECK(false, "No viable backend found.")
            |-- return Tensor()

So what does that mean for this PR, the structure looks pretty good. I made some changes here that should enable this, so once this lands we can make land your updates: #126832

The dispatching logic for the kernels will be
default_choice is math, (if a device doesnt register a stub then they will get routed to math)

if a choice is registered devices have the option to go to an overridable op that this pr provides. That op should have no preprocessing but will be run through 'validate_sdpa' and convert attn_mask from bool to float

# Summary Adds a public method to dispatchstub to check if a fn has been registered for a device. We use this new function to clean up the dispatching logic for SDPA, as well as make the private use dispatching simpler: #126392 Pull Request resolved: #126832 Approved by: https://github.com/ezyang, https://github.com/albanD

# Summary Adds a public method to dispatchstub to check if a fn has been registered for a device. We use this new function to clean up the dispatching logic for SDPA, as well as make the private use dispatching simpler: pytorch#126392 Pull Request resolved: pytorch#126832 Approved by: https://github.com/ezyang, https://github.com/albanD

1274085042 · 2024-05-29T02:28:14Z

@drisspg
could this update be landed?

drisspg · 2024-05-29T03:35:30Z

The PR I referenced above has landed can you rebase?

1274085042 · 2024-05-29T06:48:41Z

@pytorchmergebot rebase

pytorchmergebot · 2024-05-29T06:50:09Z

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

pytorchmergebot · 2024-05-29T06:50:13Z

Successfully rebased flash_attention_overrideable onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout flash_attention_overrideable && git pull --rebase)

1274085042 · 2024-05-31T01:56:42Z

@drisspg Rebased and fixed some CI issues

drisspg · 2024-05-31T21:47:36Z

aten/src/ATen/native/transformers/attention.cpp

@@ -680,10 +684,15 @@ Tensor scaled_dot_product_attention(
 auto out_lse_softmax = at::_scaled_dot_product_flash_attention(
 query_padded, key_padded, value_padded, dropout_p, is_causal, false /*return_debug_mask*/, og_scale.as_float_unchecked());
 return post_process_flash_output(std::get<0>(out_lse_softmax), og_size);
- }
+ } else if (query_.device().type() == DeviceType::PrivateUse1) {


This doesnt look right to me

It should now just be 1 more case switch entry

You will need to add the overridable backend

case sdp::SDPBackend::overridable: return std::get<0>(at::_scaled_dot_product_attention_overridable( ...));``` Private use authors would thsu register a dispatch to the stub and have it return the overrridable backend by default they would be routed to the math backend

@drisspg Could you please help review the PR again? Thanks!

If you have any further questions, feel free to bring them up. @drisspg

drisspg

left a comment

linux-foundation-easycla · 2024-06-03T12:55:54Z

The committers listed above are authorized under a signed CLA.

✅ login: 1274085042 / name: FEI (30828ab, 6b834ea, 5c8d76f, 0d7cc7b, ef05a91, a9733f1, 8653714, 0af5e6f, 9b7f54b)

aten/src/ATen/native/transformers/sdp_utils_cpp.h

drisspg

One more small comment but otherwise this is looking really good

pytorchmergebot · 2024-06-28T08:28:55Z

Merge failed

Reason: 3 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

1274085042 · 2024-06-28T08:30:09Z

@pytorchbot rebase -b main

pytorchmergebot · 2024-06-28T08:31:33Z

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

pytorchmergebot · 2024-06-28T08:31:35Z

Tried to rebase and push PR #126392, but it was already up to date.

1274085042 · 2024-06-28T08:32:54Z

@pytorchbot merge

pytorchmergebot · 2024-06-28T08:34:23Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-06-28T08:34:29Z

Merge failed

Reason: 3 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team

Raised by workflow job

Failing merge rule: Core Maintainers

1274085042 · 2024-06-28T10:14:23Z

@pytorchbot rebase -b main

pytorchmergebot · 2024-06-28T10:15:44Z

@pytorchbot started a rebase job onto refs/remotes/origin/main. Check the current status here

pytorchmergebot · 2024-06-28T10:15:48Z

Successfully rebased flash_attention_overrideable onto refs/remotes/origin/main, please pull locally before adding more changes (for example, via git checkout flash_attention_overrideable && git pull --rebase)

1274085042 · 2024-06-28T17:41:37Z

@pytorchbot merge

pytorchmergebot · 2024-06-28T17:43:17Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

1274085042 requested review from albanD and soulitzer as code owners May 16, 2024 12:24

pytorchbot added the open source label May 16, 2024

albanD requested review from drisspg and removed request for albanD May 16, 2024 13:25

drisspg reviewed May 20, 2024

View reviewed changes

aten/src/ATen/native/native_functions.yaml Outdated Show resolved Hide resolved

drisspg reviewed May 20, 2024

View reviewed changes

aten/src/ATen/native/transformers/attention.cpp Outdated Show resolved Hide resolved

1274085042 requested a review from drisspg May 21, 2024 02:55

sujoysaraswati reviewed May 21, 2024

View reviewed changes

aten/src/ATen/native/transformers/attention.cpp Outdated Show resolved Hide resolved

mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 21, 2024

drisspg requested a review from jainapurva May 21, 2024 23:59

drisspg mentioned this pull request May 22, 2024

Update dispatch stub to make SDPA routing cleaner #126832

Closed

pytorchmergebot force-pushed the flash_attention_overrideable branch from c276523 to 761a79d Compare May 29, 2024 06:50

drisspg reviewed May 31, 2024

View reviewed changes

drisspg requested changes May 31, 2024

View reviewed changes

1274085042 force-pushed the flash_attention_overrideable branch from d28f942 to 93e6eb4 Compare June 3, 2024 13:15

1274085042 requested a review from drisspg June 3, 2024 13:16

drisspg reviewed Jun 3, 2024

View reviewed changes

aten/src/ATen/native/transformers/sdp_utils_cpp.h Outdated Show resolved Hide resolved

drisspg reviewed Jun 3, 2024

View reviewed changes

pytorchmergebot removed the merging label Jun 28, 2024

pytorchmergebot added the merging label Jun 28, 2024

pytorchmergebot removed the merging label Jun 28, 2024

1274085042 added 9 commits June 28, 2024 10:15

sdp::SDPBackend::flash_attention support PrivateUse1

0d7cc7b

update

30828ab

update

5c8d76f

update

ef05a91

update

6b834ea

update

0af5e6f

add UT

8653714

fx UT issue

a9733f1

fix code format

9b7f54b

pytorchmergebot force-pushed the flash_attention_overrideable branch from e28410c to 9b7f54b Compare June 28, 2024 10:15

pytorchmergebot added the merging label Jun 28, 2024

pytorchmergebot added the Merged label Jun 28, 2024

pytorchmergebot closed this in 59e4e92 Jun 28, 2024

pytorchmergebot removed the merging label Jun 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sdp::SDPBackend::flash_attention support PrivateUse1 #126392

sdp::SDPBackend::flash_attention support PrivateUse1 #126392

1274085042 commented May 16, 2024 •

edited

Loading

pytorch-bot bot commented May 16, 2024 •

edited

Loading

drisspg commented May 22, 2024

1274085042 commented May 29, 2024

drisspg commented May 29, 2024

1274085042 commented May 29, 2024

pytorchmergebot commented May 29, 2024

pytorchmergebot commented May 29, 2024

1274085042 commented May 31, 2024

drisspg May 31, 2024

1274085042 Jun 25, 2024

1274085042 Jun 28, 2024

drisspg left a comment

linux-foundation-easycla bot commented Jun 3, 2024 •

edited

Loading

drisspg left a comment

pytorchmergebot commented Jun 28, 2024

1274085042 commented Jun 28, 2024

pytorchmergebot commented Jun 28, 2024

pytorchmergebot commented Jun 28, 2024

1274085042 commented Jun 28, 2024

pytorchmergebot commented Jun 28, 2024

pytorchmergebot commented Jun 28, 2024

1274085042 commented Jun 28, 2024

pytorchmergebot commented Jun 28, 2024

pytorchmergebot commented Jun 28, 2024

1274085042 commented Jun 28, 2024

pytorchmergebot commented Jun 28, 2024

sdp::SDPBackend::flash_attention support PrivateUse1 #126392

sdp::SDPBackend::flash_attention support PrivateUse1 #126392

Conversation

1274085042 commented May 16, 2024 • edited Loading

pytorch-bot bot commented May 16, 2024 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126392

✅ No Failures

drisspg commented May 22, 2024

1274085042 commented May 29, 2024

drisspg commented May 29, 2024

1274085042 commented May 29, 2024

pytorchmergebot commented May 29, 2024

pytorchmergebot commented May 29, 2024

1274085042 commented May 31, 2024

drisspg May 31, 2024

Choose a reason for hiding this comment

1274085042 Jun 25, 2024

Choose a reason for hiding this comment

1274085042 Jun 28, 2024

Choose a reason for hiding this comment

drisspg left a comment

Choose a reason for hiding this comment

linux-foundation-easycla bot commented Jun 3, 2024 • edited Loading

drisspg left a comment

Choose a reason for hiding this comment

pytorchmergebot commented Jun 28, 2024

Merge failed

1274085042 commented Jun 28, 2024

pytorchmergebot commented Jun 28, 2024

pytorchmergebot commented Jun 28, 2024

1274085042 commented Jun 28, 2024

pytorchmergebot commented Jun 28, 2024

Merge started

pytorchmergebot commented Jun 28, 2024

Merge failed

1274085042 commented Jun 28, 2024

pytorchmergebot commented Jun 28, 2024

pytorchmergebot commented Jun 28, 2024

1274085042 commented Jun 28, 2024

pytorchmergebot commented Jun 28, 2024

Merge started

1274085042 commented May 16, 2024 •

edited

Loading

pytorch-bot bot commented May 16, 2024 •

edited

Loading

linux-foundation-easycla bot commented Jun 3, 2024 •

edited

Loading