Compile FBGEMM on H100 #2298

xuzhao9 · 2024-06-12T20:15:12Z

H100 requires a special 9.0a arch list to gain full kernel compilation.

Test plan:
https://github.com/pytorch/benchmark/actions/runs/9489468560

Benchmark result:
OSS Repro on H100 devgpu 500W:

(py311) [[email protected] ~/local/benchmark (xz9/add-fbgemm)]$ CUDA_VISIBLE_DEVICES=4 python run_benchmark.py triton --op fp8_gemm_rowwise  --m 4 --n 3584 --k 8192 --num-inputs 1 --only _triton,_cutlass

(M, N, K)    _triton-tflops    _cutlass-accuracy    _cutlass-speedup    _cutlass-tflops
---------------  ----------------  -------------------  ------------------  -----------------
(4, 3584, 8192)           2.93719                    1             2.72519             8.0044

Internal fbcode repro:

$ CUDA_VISIBLE_DEVICES=4 buck2 run @mode/opt  -c fbcode.nvcc_arch=h100a -c fbcode.platform010_cuda_version=12.4  //pytorch/benchmark:triton -- --op fp8_gemm_rowwise  --m 4 --n 3584 --k 8192 --num-inputs 1 --only _triton,_cutlass

      (M, N, K)    _triton-tflops    _cutlass-accuracy    _cutlass-speedup    _cutlass-tflops
---------------  ----------------  -------------------  ------------------  -----------------
(4, 3584, 8192)           6.28427                    1             1.28493            8.07484

They are using the same FBGEMM commit hash. The OSS triton version seems to have worse performance than Meta Internal version, but the CUTLASS kernel perf are similar.

facebook-github-bot · 2024-06-12T20:18:07Z

@xuzhao9 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-06-12T20:43:26Z

@xuzhao9 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-06-12T22:21:31Z

@xuzhao9 merged this pull request in c8d6c2a.

xuzhao9 · 2024-07-04T14:51:20Z

Updated result on 20240704:
OSS:

$ CUDA_VISIBLE_DEVICES=4 python run_benchmark.py triton --op fp8_gemm_rowwise  --m 4 --n 3584 --k 8192 --num-inputs 1 --only _triton,_cutlass --metrics tflops
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:12<00:00, 12.91s/it]
      (M, N, K)    _triton-tflops    _cutlass-tflops
---------------  ----------------  -----------------
(4, 3584, 8192)           10.2685            14.0767

Internal:

      (M, N, K)    _triton-tflops    _cutlass-accuracy    _cutlass-speedup    _cutlass-tflops
---------------  ----------------  -------------------  ------------------  -----------------
(4, 3584, 8192)           8.40547                    0             1.68237            14.1411

Add compile to H100

f74852b

xuzhao9 had a problem deploying to docker-s3-upload June 12, 2024 20:15 — with GitHub Actions Error

facebook-github-bot added the cla signed label Jun 12, 2024

xuzhao9 requested a review from q10 June 12, 2024 20:15

Add H100 9.0a

3c913ba

xuzhao9 had a problem deploying to docker-s3-upload June 12, 2024 20:43 — with GitHub Actions Failure

xuzhao9 temporarily deployed to docker-s3-upload June 12, 2024 20:43 — with GitHub Actions Inactive

q10 approved these changes Jun 12, 2024

View reviewed changes

facebook-github-bot closed this in c8d6c2a Jun 12, 2024

facebook-github-bot added the Merged label Jun 12, 2024

xuzhao9 deleted the xz9/fix-fbgemm branch June 12, 2024 22:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compile FBGEMM on H100 #2298

Compile FBGEMM on H100 #2298

xuzhao9 commented Jun 12, 2024 •

edited

Loading

facebook-github-bot commented Jun 12, 2024

facebook-github-bot commented Jun 12, 2024

facebook-github-bot commented Jun 12, 2024

xuzhao9 commented Jul 4, 2024 •

edited

Loading

Compile FBGEMM on H100 #2298

Compile FBGEMM on H100 #2298

Conversation

xuzhao9 commented Jun 12, 2024 • edited Loading

facebook-github-bot commented Jun 12, 2024

facebook-github-bot commented Jun 12, 2024

facebook-github-bot commented Jun 12, 2024

xuzhao9 commented Jul 4, 2024 • edited Loading

xuzhao9 commented Jun 12, 2024 •

edited

Loading

xuzhao9 commented Jul 4, 2024 •

edited

Loading