![linux logo](https://raw.githubusercontent.com/github/explore/80688e429a7d4ef2fca1e82350fe8e3517d3494d/topics/linux/linux.png)
-
Shanghai Jiao Tong University
- Shanghai
- raphael-hao.top
Highlights
- Pro
Block or Report
Block or report Raphael-Hao
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLanguage
Sort by: Recently starred
Starred repositories
OpenAI ChatGPT, GPT-3, GPT-4, DALL·E, Whisper API wrapper for Go
[ICML 2024] Quest: Query-Aware Sparsity for Efficient Long-Context LLM Inference
Robust Speech Recognition via Large-Scale Weak Supervision
Clone a voice in 5 seconds to generate arbitrary speech in real-time
[EuroSys'24] Minuet: Accelerating 3D Sparse Convolutions on GPUs
Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models
[ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.
This is the official github repo of Think-on-Graph. If you are interested in our work or willing to join our research team in Shenzhen, please feel free to contact us by email ([email protected])
Stick Rules -- Quantumult X / Loon / ClashX Rules \ Quantumult back to CN Rules
毒奶自用,懒人配置文件(Quantumult X):去广告分流规则、Tiktok解锁重写、VSCO解锁、神机分流、blackmatrix7分流规则。
A fast communication-overlapping library for tensor parallelism on GPUs.
[OSDI'24] Serving LLM-based Applications Efficiently with Semantic Variable
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Interference-aware CPU scheduling that enables performance isolation and high CPU utilization for datacenter servers
OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched
Gallatin is a general-purpose memory manager for CUDA that allows for threads to quickly malloc and free memory of arbitrary size inside of kernels.