Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add colfax_cutlass backend to flash_attention operator #2296

Closed
wants to merge 13 commits into from

Conversation

xuzhao9
Copy link
Contributor

@xuzhao9 xuzhao9 commented Jun 12, 2024

Add colfax_cutlass kernel compilation:

$ python install.py --userbenchmark triton --cutlass

Run with sdpa, triton_tutorial_flash_v2, and colfax_cutlass on H100:

$ python run_benchmark.py triton --op flash_attention --only sdpa,triton_tutorial_flash_v2,colfax_cutlass --batch 128 --input-id 3 --num-inputs 5 --n-heads 8 --d-head 128 --metrics latency,tflops
  SeqLen    sdpa-latency    sdpa-tflops    triton_tutorial_flash_v2-latency    triton_tutorial_flash_v2-tflops    colfax_cutlass-latency    colfax_cutlass-tflops
--------  --------------  -------------  ----------------------------------  ---------------------------------  ------------------------  -----------------------
    1024         1.91248        287.457                             1.55574                            353.372                   1.38538                  396.828
    2048         7.49987        293.208                             5.70656                            385.35                    5.4792                   401.34
    4096        29.4748         298.428                            21.7369                             404.662                  20.8335                   422.21
    8192       122.297          287.696                            85.1293                             413.305                  82.3884                   427.055
   16384       462.649          304.199                           334.992                              420.122                 328.363                    428.604

@xuzhao9 xuzhao9 changed the title Fix flash attention kernel Add colfax_cutlass backend to flash_attention operator Jun 15, 2024
@facebook-github-bot
Copy link
Contributor

@xuzhao9 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Member

@aaronenyeshi aaronenyeshi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@facebook-github-bot
Copy link
Contributor

@xuzhao9 merged this pull request in d5f0a12.

@xuzhao9 xuzhao9 deleted the xz9/fix-attn branch June 19, 2024 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants