HuangliangDai

Huangliang Dai HuangliangDai

Block or Report

Block or report HuangliangDai

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Fault-Tolerant-SGEMM-on-NVIDIA-GPUs Fault-Tolerant-SGEMM-on-NVIDIA-GPUs Public

Forked from shixun404/Fault-Tolerant-SGEMM-on-NVIDIA-GPUs

Cuda
ByteTransformer ByteTransformer Public

Forked from bytedance/ByteTransformer

optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052

C++
fused-attention fused-attention Public

Forked from kst179/fused-attention

Fast and low-memory attention layer written in CUDA

Cuda
flash-attention flash-attention Public

Forked from Dao-AILab/flash-attention

Fast and memory-efficient exact attention

Python
tiny-flash-attention tiny-flash-attention Public

Forked from 66RING/tiny-flash-attention

flash attention tutorial written in python, triton, cuda, cutlass

Cuda
flash_attention_inference flash_attention_inference Public

Forked from ShaYeBuHui01/flash_attention_inference

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

C++