Block or Report
Block or report wangcx18
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLists (4)
Sort Name ascending (A-Z)
Stars
Language
Sort by: Recently starred
Run PyTorch LLMs locally on servers, desktop and mobile
#1 Locally hosted web application that allows you to perform various operations on PDF files
The Supabase for RAG - R2R lets you build, scale, and manage user-facing Retrieval-Augmented Generation applications in production.
[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs
An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.
🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly better than Adam(w), in Pytorch
PygmalionAI's large-scale inference engine
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
A native PyTorch Library for large model training
A minimal GPU design in Verilog to learn how GPUs work from the ground up
HF-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Video+code lecture on building nanoGPT from scratch
llama3.cuda is a pure C/CUDA implementation for Llama 3 model.
FlagGems is an operator library for large language models implemented in Triton Language.
Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
Sample codes for my CUDA programming book
FlashInfer: Kernel Library for LLM Serving
⛄ Possibly the smallest compiler ever