Skip to content
View KyeeHuang's full-sized avatar

Block or report KyeeHuang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

a high-performance, large-capacity, multi-tenant, data-persistent, strong data consistency based on raft, Redis-compatible elastic KV data storage system based on RocksDB

C++ 9 7 Updated Oct 12, 2024

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 581 52 Updated Apr 7, 2024

使用 cutlass 实现 flash-attention 精简版,具有教学意义

Cuda 30 1 Updated Aug 12, 2024

使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention

Cuda 48 3 Updated Aug 12, 2024

Material for gpu-mode lectures

Jupyter Notebook 2,691 267 Updated Oct 11, 2024

a high-performance, large-capacity, multi-tenant, data-persistent, strong data consistency based on raft, Redis-compatible elastic KV data storage system based on RocksDB

C++ 197 63 Updated Sep 28, 2024

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

1,077 23 Updated Jul 31, 2024

CUDA Templates for Linear Algebra Subroutines

C++ 5,489 934 Updated Oct 9, 2024
Jupyter Notebook 62 6 Updated Jul 23, 2024

how to learn PyTorch and OneFlow

338 20 Updated Mar 22, 2024

程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).

Dockerfile 66,850 8,706 Updated Oct 6, 2024

Source code for all entries from the 2023 ZPrize competition

C++ 18 6 Updated Jun 15, 2024

Fast and memory-efficient exact attention

Python 13,753 1,260 Updated Oct 14, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 28,361 4,200 Updated Oct 14, 2024

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

3,486 147 Updated Sep 25, 2024

A minimal GPU design in Verilog to learn how GPUs work from the ground up

SystemVerilog 6,995 523 Updated Aug 18, 2024

Pika is a Redis-Compatible database developed by Qihoo's infrastructure team.

C++ 5,861 1,191 Updated Oct 14, 2024

A fast, light-weight proxy for memcached and redis

C 12,131 2,055 Updated Mar 29, 2024

SPDK mirror of DPDK

C 56 52 Updated Sep 11, 2024

Public repostory for the DAC 2021 paper "Scaling up HBM Efficiency of Top-K SpMV forApproximate Embedding Similarity on FPGAs"

C++ 14 5 Updated Aug 29, 2021
Rust 15 6 Updated Oct 9, 2023

A fast inference library for running LLMs locally on modern consumer-class GPUs

Python 3,564 273 Updated Oct 1, 2024

🎉 Modern CUDA Learn Notes with PyTorch: fp32, fp16, bf16, fp8/int8, flash_attn, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

Cuda 1,266 135 Updated Oct 14, 2024

GNU Libc - Extremely old repo used for research purposes years ago. Please do not rely on this repo.

C 1,849 950 Updated Aug 24, 2018
C++ 9,248 4,407 Updated Oct 14, 2024

Chinese Manual Pages

Roff 1,317 134 Updated Jun 30, 2023

电子科技大学分布式存储与计算实验室新生训练计划

886 166 Updated Sep 9, 2024

An unofficial cuda assembler, for all generations of SASS, hopefully :)

Python 395 71 Updated Apr 20, 2023
Next