KyeeHuang

KK KyeeHuang

Achievements

Starred repositories

arana-db / kiwi

a high-performance, large-capacity, multi-tenant, data-persistent, strong data consistency based on raft, Redis-compatible elastic KV data storage system based on RocksDB

C++ 9 7 Updated Oct 12, 2024

tspeterkim / flash-attention-minimal

Flash Attention in ~100 lines of CUDA (forward pass only)

Cuda 581 52 Updated Apr 7, 2024

weishengying / tiny-flash-attention

使用 cutlass 实现 flash-attention 精简版，具有教学意义

Cuda 30 1 Updated Aug 12, 2024

weishengying / cutlass_flash_atten_fp8

使用 cutlass 仓库在 ada 架构上实现 fp8 的 flash attention

Cuda 48 3 Updated Aug 12, 2024

gpu-mode / lectures

Material for gpu-mode lectures

Jupyter Notebook 2,691 267 Updated Oct 11, 2024

OpenAtomFoundation / pikiwidb

a high-performance, large-capacity, multi-tenant, data-persistent, strong data consistency based on raft, Redis-compatible elastic KV data storage system based on RocksDB

C++ 197 63 Updated Sep 28, 2024

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

1,077 23 Updated Jul 31, 2024

NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 5,489 934 Updated Oct 9, 2024

madsys-dev / deepseekv2-profile

Jupyter Notebook 62 6 Updated Jul 23, 2024

BBuf / how-to-learn-deep-learning-framework

how to learn PyTorch and OneFlow

338 20 Updated Mar 22, 2024

Anduin2017 / HowToCook

程序员在家做饭方法指南。Programmer's guide about how to cook at home (Simplified Chinese only).

Dockerfile 66,850 8,706 Updated Oct 6, 2024

z-prize / 2023-entries

Source code for all entries from the 2023 ZPrize competition

C++ 18 6 Updated Jun 15, 2024

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 13,753 1,260 Updated Oct 14, 2024

chenzomi12 / chenzomi12.github.io

HTML 190 34 Updated Oct 9, 2024

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 28,361 4,200 Updated Oct 14, 2024

deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

3,486 147 Updated Sep 25, 2024

adam-maj / tiny-gpu

A minimal GPU design in Verilog to learn how GPUs work from the ground up

SystemVerilog 6,995 523 Updated Aug 18, 2024

OpenAtomFoundation / pika

Pika is a Redis-Compatible database developed by Qihoo's infrastructure team.

C++ 5,861 1,191 Updated Oct 14, 2024

twitter / twemproxy

A fast, light-weight proxy for memcached and redis

C 12,131 2,055 Updated Mar 29, 2024

spdk / dpdk

SPDK mirror of DPDK

C 56 52 Updated Sep 11, 2024

AlbertoParravicini / approximate-spmv-topk

Public repostory for the DAC 2021 paper "Scaling up HBM Efficiency of Top-K SpMV forApproximate Embedding Similarity on FPGAs"

C++ 14 5 Updated Aug 29, 2021

Xilinx / Vitis-HLS-Introductory-Examples

C++ 585 158 Updated Jun 7, 2024

Quarky93 / warpshell

Rust 15 6 Updated Oct 9, 2023

turboderp / exllamav2

A fast inference library for running LLMs locally on modern consumer-class GPUs

Python 3,564 273 Updated Oct 1, 2024

DefTruth / CUDA-Learn-Notes

🎉 Modern CUDA Learn Notes with PyTorch: fp32, fp16, bf16, fp8/int8, flash_attn, sgemm, sgemv, warp/block reduce, dot, elementwise, softmax, layernorm, rmsnorm.

Cuda 1,266 135 Updated Oct 14, 2024

lattera / glibc

GNU Libc - Extremely old repo used for research purposes years ago. Please do not rely on this repo.

C 1,849 950 Updated Aug 24, 2018

gcc-mirror / gcc

C++ 9,248 4,407 Updated Oct 14, 2024

man-pages-zh / manpages-zh

Forked from lidaobing/manpages-zh

Chinese Manual Pages

Roff 1,317 134 Updated Jun 30, 2023

CDDSCLab / training-plan

电子科技大学分布式存储与计算实验室新生训练计划

886 166 Updated Sep 9, 2024

cloudcores / CuAssembler

An unofficial cuda assembler, for all generations of SASS, hopefully ：）

Python 395 71 Updated Apr 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KK KyeeHuang

Achievements

Achievements

Block or report KyeeHuang

Starred repositories

arana-db / kiwi

tspeterkim / flash-attention-minimal

weishengying / tiny-flash-attention

weishengying / cutlass_flash_atten_fp8

gpu-mode / lectures

OpenAtomFoundation / pikiwidb

kvcache-ai / Mooncake

NVIDIA / cutlass

madsys-dev / deepseekv2-profile

BBuf / how-to-learn-deep-learning-framework

Anduin2017 / HowToCook

z-prize / 2023-entries

Dao-AILab / flash-attention

chenzomi12 / chenzomi12.github.io

vllm-project / vllm

deepseek-ai / DeepSeek-V2

adam-maj / tiny-gpu

OpenAtomFoundation / pika

twitter / twemproxy

spdk / dpdk

AlbertoParravicini / approximate-spmv-topk

Xilinx / Vitis-HLS-Introductory-Examples

Quarky93 / warpshell

turboderp / exllamav2

DefTruth / CUDA-Learn-Notes

lattera / glibc

gcc-mirror / gcc

man-pages-zh / manpages-zh

CDDSCLab / training-plan

cloudcores / CuAssembler

Starred topics

Linux