Name		Name	Last commit message	Last commit date
Latest commit History 69 Commits
include/utils		include/utils
lib/gemm		lib/gemm
src		src
LICENSE		LICENSE
README.md		README.md

Repository files navigation

CUTLASS Kernels

Library of CUTLASS kernels targeting Large Language Models (LLM).

Building

Download CUTLASS following instructions from: https://github.com/NVIDIA/cutlass.
Modify the (hardcoded) path in the sample compile.sh to your CUTLASS directory.
Run the modified compile.sh as ./compile.sh.

Running

While running the executable make sure to set NVIDIA_TF32_OVERRIDE=1 to enable TF32 mode for cuBLAS for SGEMM. Otherwise, cuBLAS uses float32.

Notes

See README.md in sub-directories for more specific instructions.