Add colfax_cutlass backend to flash_attention operator #2296

xuzhao9 · 2024-06-12T19:05:50Z

Add colfax_cutlass kernel compilation:

$ python install.py --userbenchmark triton --cutlass

Run with sdpa, triton_tutorial_flash_v2, and colfax_cutlass on H100:

$ python run_benchmark.py triton --op flash_attention --only sdpa,triton_tutorial_flash_v2,colfax_cutlass --batch 128 --input-id 3 --num-inputs 5 --n-heads 8 --d-head 128 --metrics latency,tflops
  SeqLen    sdpa-latency    sdpa-tflops    triton_tutorial_flash_v2-latency    triton_tutorial_flash_v2-tflops    colfax_cutlass-latency    colfax_cutlass-tflops
--------  --------------  -------------  ----------------------------------  ---------------------------------  ------------------------  -----------------------
    1024         1.91248        287.457                             1.55574                            353.372                   1.38538                  396.828
    2048         7.49987        293.208                             5.70656                            385.35                    5.4792                   401.34
    4096        29.4748         298.428                            21.7369                             404.662                  20.8335                   422.21
    8192       122.297          287.696                            85.1293                             413.305                  82.3884                   427.055
   16384       462.649          304.199                           334.992                              420.122                 328.363                    428.604

facebook-github-bot · 2024-06-17T13:31:59Z

@xuzhao9 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

aaronenyeshi

LGTM!

facebook-github-bot · 2024-06-17T18:49:54Z

@xuzhao9 merged this pull request in d5f0a12.

xuzhao9 had a problem deploying to docker-s3-upload June 12, 2024 19:05 — with GitHub Actions Error

facebook-github-bot added the cla signed label Jun 12, 2024

xuzhao9 had a problem deploying to docker-s3-upload June 12, 2024 20:13 — with GitHub Actions Error

xuzhao9 force-pushed the xz9/fix-attn branch from 10f8e7d to df9f3d9 Compare June 12, 2024 21:00

xuzhao9 had a problem deploying to docker-s3-upload June 12, 2024 21:00 — with GitHub Actions Error

xuzhao9 had a problem deploying to docker-s3-upload June 12, 2024 21:01 — with GitHub Actions Error

xuzhao9 had a problem deploying to docker-s3-upload June 12, 2024 21:11 — with GitHub Actions Error

xuzhao9 had a problem deploying to docker-s3-upload June 12, 2024 23:25 — with GitHub Actions Error

xuzhao9 added 5 commits June 12, 2024 19:27

Fix flash attention kernel

0ba03e3

Make xlformers optional

ea1a6a7

Add cutlass kernels from colfax

f5e9cd4

Install cutlass kernels.

cf836ef

get 9.0a

2efddd9

xuzhao9 force-pushed the xz9/fix-attn branch from b341059 to 2efddd9 Compare June 12, 2024 23:27

xuzhao9 had a problem deploying to docker-s3-upload June 12, 2024 23:28 — with GitHub Actions Error

Add load_library

3710a06

xuzhao9 had a problem deploying to docker-s3-upload June 13, 2024 00:22 — with GitHub Actions Failure

Build install script

248f154

xuzhao9 had a problem deploying to docker-s3-upload June 15, 2024 00:26 — with GitHub Actions Error

Add register op

9c37293

xuzhao9 had a problem deploying to docker-s3-upload June 15, 2024 01:53 — with GitHub Actions Error

xuzhao9 had a problem deploying to docker-s3-upload June 15, 2024 01:54 — with GitHub Actions Error

Rename the cutlass kernel files

147d65b

xuzhao9 had a problem deploying to docker-s3-upload June 15, 2024 02:57 — with GitHub Actions Error

bugfix

a9483dd

xuzhao9 had a problem deploying to docker-s3-upload June 15, 2024 02:59 — with GitHub Actions Error

xuzhao9 had a problem deploying to docker-s3-upload June 15, 2024 03:00 — with GitHub Actions Error

Another bugfix

b9afbb7

xuzhao9 had a problem deploying to docker-s3-upload June 15, 2024 03:01 — with GitHub Actions Error

Add debug

7463793

xuzhao9 had a problem deploying to docker-s3-upload June 15, 2024 03:04 — with GitHub Actions Error

xuzhao9 requested a review from jianyuh June 15, 2024 03:13

Remove redundant file

7591a43

xuzhao9 had a problem deploying to docker-s3-upload June 15, 2024 03:16 — with GitHub Actions Failure

xuzhao9 changed the title ~~Fix flash attention kernel~~ Add colfax_cutlass backend to flash_attention operator Jun 15, 2024

aaronenyeshi approved these changes Jun 17, 2024

View reviewed changes

facebook-github-bot closed this in d5f0a12 Jun 17, 2024

facebook-github-bot added the Merged label Jun 17, 2024

xuzhao9 deleted the xz9/fix-attn branch June 19, 2024 18:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add colfax_cutlass backend to flash_attention operator #2296

Add colfax_cutlass backend to flash_attention operator #2296

xuzhao9 commented Jun 12, 2024 •

edited

Loading

facebook-github-bot commented Jun 17, 2024

aaronenyeshi left a comment

facebook-github-bot commented Jun 17, 2024

Add colfax_cutlass backend to flash_attention operator #2296

Add colfax_cutlass backend to flash_attention operator #2296

Conversation

xuzhao9 commented Jun 12, 2024 • edited Loading

facebook-github-bot commented Jun 17, 2024

aaronenyeshi left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Jun 17, 2024

xuzhao9 commented Jun 12, 2024 •

edited

Loading