![linux logo](https://raw.githubusercontent.com/github/explore/80688e429a7d4ef2fca1e82350fe8e3517d3494d/topics/linux/linux.png)
-
Shanghai Jiao Tong University
- Shanghai
- raphael-hao.top
Highlights
- Pro
Block or Report
Block or report Raphael-Hao
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLanguage: Cuda
Sort by: Most stars
Starred repositories
how to optimize some algorithm in cuda.
[MICRO'23, MLSys'22] TorchSparse: Efficient Training and Inference Framework for Sparse Convolution on GPUs.
FlashInfer: Kernel Library for LLM Serving
Examples demonstrating available options to program multiple GPUs in a single node or a cluster
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
[EuroSys'24] Minuet: Accelerating 3D Sparse Convolutions on GPUs
Gallatin is a general-purpose memory manager for CUDA that allows for threads to quickly malloc and free memory of arbitrary size inside of kernels.