Block or Report
Block or report HuangliangDai
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abusePopular repositories Loading
-
Fault-Tolerant-SGEMM-on-NVIDIA-GPUs
Fault-Tolerant-SGEMM-on-NVIDIA-GPUs PublicForked from shixun404/Fault-Tolerant-SGEMM-on-NVIDIA-GPUs
Cuda
-
ByteTransformer
ByteTransformer PublicForked from bytedance/ByteTransformer
optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052
C++
-
fused-attention
fused-attention PublicForked from kst179/fused-attention
Fast and low-memory attention layer written in CUDA
Cuda
-
flash-attention
flash-attention PublicForked from Dao-AILab/flash-attention
Fast and memory-efficient exact attention
Python
-
tiny-flash-attention
tiny-flash-attention PublicForked from 66RING/tiny-flash-attention
flash attention tutorial written in python, triton, cuda, cutlass
Cuda
-
flash_attention_inference
flash_attention_inference PublicForked from ShaYeBuHui01/flash_attention_inference
Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.
C++
If the problem persists, check the GitHub status page or contact support.