LLM Infer, AI Infra, CUDA
Block or Report
Block or report Bruce-Lee-LY
Report abuse
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abusePinned
-
cuda_hgemm
cuda_hgemm PublicSeveral optimization methods of half-precision general matrix multiplication (HGEMM) using tensor core with WMMA API and MMA PTX instruction.
-
cuda_hgemv
cuda_hgemv PublicSeveral optimization methods of half-precision general matrix vector multiplication (HGEMV) using CUDA core.
-
matrix_multiply
matrix_multiply PublicSeveral common methods of matrix multiplication are implemented on CPU and Nvidia GPU using C++11 and CUDA.
-
cuda_back2back_hgemm
cuda_back2back_hgemm PublicUse tensor core to calculate back-to-back HGEMM (half-precision general matrix multiplication) with MMA PTX instruction.
-
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.