ranpin

ranpin ranpin

Happy needs no error!

1 follower · 5 following

Wula
中国北京
https://ranpin.github.io/
https://blog.csdn.net/weixin_44496128?

Stars

4 stars written in Cuda

Clear filter

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 1,471 122 Updated Sep 29, 2024

DefTruth / CUDA-Learn-Notes

🎉 Modern CUDA Learn Notes with PyTorch: fp32, fp16, bf16, fp8/int8, flash_attn, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

Cuda 1,196 128 Updated Sep 29, 2024

tspeterkim / flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 566 50 Updated Apr 7, 2024

jundaf2 / INT8-Flash-Attention-FMHA-Quantization

Cuda 152 16 Updated Sep 15, 2023

Provide feedback

Saved searches