Skip to content
View HuangliangDai's full-sized avatar
Block or Report

Block or report HuangliangDai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Popular repositories Loading

  1. Fault-Tolerant-SGEMM-on-NVIDIA-GPUs Fault-Tolerant-SGEMM-on-NVIDIA-GPUs Public

    Forked from shixun404/Fault-Tolerant-SGEMM-on-NVIDIA-GPUs

    Cuda

  2. ByteTransformer ByteTransformer Public

    Forked from bytedance/ByteTransformer

    optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052

    C++

  3. fused-attention fused-attention Public

    Forked from kst179/fused-attention

    Fast and low-memory attention layer written in CUDA

    Cuda

  4. flash-attention flash-attention Public

    Forked from Dao-AILab/flash-attention

    Fast and memory-efficient exact attention

    Python

  5. tiny-flash-attention tiny-flash-attention Public

    Forked from 66RING/tiny-flash-attention

    flash attention tutorial written in python, triton, cuda, cutlass

    Cuda

  6. flash_attention_inference flash_attention_inference Public

    Forked from ShaYeBuHui01/flash_attention_inference

    Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

    C++