Grouped Query Attention #128898

jainapurva · 2024-06-17T22:08:59Z

Approach: Using the current function declaration

Constraint: Q_Heads % KV_Heads == 0

Major change:

Added a new argument enable_gqa: bool to sdpa function call
It adds a meaning to the last third dimension.

Sample use cases this would enable:
LLama3

# LLama3 8b call to SDPA
query = torch.rand(batch, 32, seq_len_q, D)
key = torch.rand(batch, 8, seq_len_kv, D)
value = torch.rand(batch, 8, seq_len_kv, D)

output = scaled_dot_product_attention(query, key, value, is_causal=True, enable_gqa=True)

# Output Shape
(batch, 32, seq_len_q, D)

Design Choice:

Check if Query.size(-3) == Key.size(-3) == Value.size(-3) or, Query.size(-3) % Key.size(-3) == 0
The function adjusts the key and value tensors to match the query tensor's head dimension by using repeat_interleave if their number of heads are not equal, facilitating correct and efficient computation in attention mechanisms.
By default the enable_gqa flag is set to False, which ensures that regular sdpa functionality remains unchanged.

Benchmarks:

sdpa.py: Gqa benchmark #130634
For different batch sizes enable_gqa=True shows a substansial improvement in the run_time of sdpa

batch_size	q_num_heads	kv_num_heads	q_seq_len	kv_seq_len	embed_dim	forward_time when enable_gqa=True	forward_time when enable_gqa=False
1	32	8	2048	2048	2048	100.71	119.70
8	32	8	2048	2048	2048	539.78	628.83
16	32	8	2048	2048	2048	1056.81	1225.48
32	32	8	2048	2048	2048	2099.54	2440.45

TorchTitan: Updates to support new sdpa function torchtitan#458

cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @chenyang78 @kadeng @chauhang @amjames

pytorch-bot · 2024-06-17T22:09:01Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/128898

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 95e724e with merge base e6cddc9 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

jainapurva · 2024-06-20T22:36:38Z

@pytorch-bot rebase

aten/src/ATen/native/transformers/cuda/sdp_utils.cpp

test/test_transformers.py

aten/src/ATen/native/transformers/cuda/attention.cu

test/test_transformers.py

jainapurva · 2024-07-01T19:59:44Z

@pytorchbot rebase

aten/src/ATen/native/transformers/attention.cpp

aten/src/ATen/native/transformers/sdp_utils_cpp.h

test/test_transformers.py

torch/nn/functional.py

drisspg

Overall looks really good, left a few comments

I think it would be helpful to run this script: and get some micro benchmark data: https://github.com/pytorch/pytorch/blob/main/benchmarks/transformer/sdpa.py

test/test_nestedtensor.py

Summary: Test Plan: Reviewers: Subscribers: Tasks: Tags:

jainapurva · 2024-07-31T16:27:10Z

@pytorchbot merge

pytorchmergebot · 2024-07-31T16:30:54Z

This PR updates submodules third_party/fmt

If those updates are intentional, please add "submodule" keyword to PR title/description.

jainapurva · 2024-07-31T20:31:41Z

@pytorchbot merge

pytorchmergebot · 2024-07-31T20:33:54Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

drisspg self-requested a review June 18, 2024 23:27

jainapurva marked this pull request as ready for review June 20, 2024 16:20

jainapurva marked this pull request as draft June 22, 2024 18:53

drisspg reviewed Jun 26, 2024

View reviewed changes

aten/src/ATen/native/transformers/cuda/sdp_utils.cpp Outdated Show resolved Hide resolved

drisspg reviewed Jun 27, 2024

View reviewed changes

test/test_transformers.py Outdated Show resolved Hide resolved

drisspg reviewed Jun 27, 2024

View reviewed changes

aten/src/ATen/native/transformers/cuda/attention.cu Outdated Show resolved Hide resolved

drisspg reviewed Jun 27, 2024

View reviewed changes

test/test_transformers.py Outdated Show resolved Hide resolved

jainapurva marked this pull request as ready for review July 1, 2024 00:53

jainapurva requested a review from mruberry as a code owner July 1, 2024 18:04

drisspg reviewed Jul 1, 2024

View reviewed changes

test/test_transformers.py Outdated Show resolved Hide resolved

drisspg reviewed Jul 1, 2024

View reviewed changes

test/test_transformers.py Outdated Show resolved Hide resolved

pytorchmergebot force-pushed the grouped-query-attention branch from d6a4752 to 697d565 Compare July 1, 2024 20:01

jainapurva requested review from albanD, jbschlosser and mikaylagawarecki as code owners July 2, 2024 19:36

jainapurva marked this pull request as draft July 3, 2024 21:35

pytorch-bot bot added ciflow/inductor module: dynamo labels Jul 4, 2024

jainapurva marked this pull request as ready for review July 8, 2024 17:42

albanD removed their request for review July 8, 2024 23:02