Raphael-Hao

🪄

Mogic

Weihao Cui Raphael-Hao

🪄

Mogic

48 followers · 37 following

Shanghai Jiao Tong University
Shanghai
raphael-hao.top

Achievements

Highlights

Block or Report

Block or report Raphael-Hao

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

18 stars written in Cuda

Clear filter

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 21,652 2,356 Updated Jul 12, 2024

HazyResearch / ThunderKittens

Tile primitives for speedy kernels

Cuda 1,374 47 Updated Jul 12, 2024

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 1,200 99 Updated Jul 12, 2024

antonmks / Alenka

GPU database engine

Cuda 1,171 120 Updated Jan 30, 2017

mit-han-lab / torchsparse

[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.

Cuda 1,156 131 Updated Jul 11, 2024

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 786 69 Updated Jul 12, 2024

NVIDIA / multi-gpu-programming-models

Examples demonstrating available options to program multiple GPUs in a single node or a cluster

Cuda 490 101 Updated Mar 14, 2024

Yinghan-Li / YHs_Sample

Yinghan's Code Sample

Cuda 259 47 Updated Jul 25, 2022

AlibabaResearch / flash-llm

Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity

Cuda 158 12 Updated Sep 24, 2023

nicolaswilde / cuda-tensorcore-hgemm

Cuda 89 18 Updated Aug 25, 2022

mit-han-lab / Quest

[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference

Cuda 83 5 Updated Jul 3, 2024

LeiWang1999 / tvm_gpu_gemm

play gemm with tvm

Cuda 79 10 Updated Jul 22, 2023

UofT-EcoSystem / Minuet

[EuroSys'24] Minuet: Accelerating 3D Sparse Convolutions on GPUs

Cuda 68 2 Updated Jun 7, 2024

gsampler9 / gSampler

Cuda 22 4 Updated Jun 20, 2024

microsoft / ConvStencil

Cuda 18 5 Updated Apr 10, 2024

zheng-ningxin / SparTA

Cuda 7 1 Updated Aug 9, 2023

saltsystemslab / gallatin

Gallatin is a general-purpose memory manager for CUDA that allows for threads to quickly malloc and free memory of arbitrary size inside of kernels.

Cuda 4 Updated Mar 4, 2024

zheng-ningxin / nmsparse

Cuda 3 Updated Feb 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weihao Cui Raphael-Hao

Achievements

Achievements

Highlights

Block or report Raphael-Hao

Starred repositories

karpathy / llm.c

HazyResearch / ThunderKittens

BBuf / how-to-optim-algorithm-in-cuda

antonmks / Alenka

mit-han-lab / torchsparse

flashinfer-ai / flashinfer

NVIDIA / multi-gpu-programming-models

Yinghan-Li / YHs_Sample

AlibabaResearch / flash-llm

nicolaswilde / cuda-tensorcore-hgemm

mit-han-lab / Quest

LeiWang1999 / tvm_gpu_gemm

UofT-EcoSystem / Minuet

gsampler9 / gSampler

microsoft / ConvStencil

zheng-ningxin / SparTA

saltsystemslab / gallatin

zheng-ningxin / nmsparse

Starred topics

Linux