Skip to content
View wangcx18's full-sized avatar
Block or Report

Block or report wangcx18

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
Showing results

Run PyTorch LLMs locally on servers, desktop and mobile

Python 2,005 118 Updated Aug 4, 2024

nanoGPT style version of Llama 3.1

Python 483 12 Updated Aug 1, 2024

The React library for LLMs

TypeScript 323 12 Updated Jun 27, 2024

#1 Locally hosted web application that allows you to perform various operations on PDF files

Java 35,513 2,640 Updated Aug 3, 2024

The Supabase for RAG - R2R lets you build, scale, and manage user-facing Retrieval-Augmented Generation applications in production.

Python 2,899 195 Updated Aug 3, 2024

[ICML 2024 Oral] Any-Precision LLM: Low-Cost Deployment of Multiple, Different-Sized LLMs

Python 42 3 Updated Jul 22, 2024

LLM101n: Let's build a Storyteller

26,457 1,422 Updated Aug 1, 2024

An LLM-powered knowledge curation system that researches a topic and generates a full-length report with citations.

Python 8,919 835 Updated Aug 3, 2024

🦁 Lion, new optimizer discovered by Google Brain using genetic algorithms that is purportedly better than Adam(w), in Pytorch

Python 1,981 49 Updated Jun 15, 2024

PygmalionAI's large-scale inference engine

Python 843 95 Updated Aug 3, 2024
Python 112 13 Updated Jul 23, 2024

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

948 20 Updated Jul 31, 2024
Python 89 4 Updated Jun 12, 2024

Using GPT to parse PDF

Python 2,558 193 Updated Aug 1, 2024

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Python 363 15 Updated Jul 19, 2024

A native PyTorch Library for large model training

Python 1,396 126 Updated Aug 4, 2024

A minimal GPU design in Verilog to learn how GPUs work from the ground up

SystemVerilog 6,761 504 Updated Jun 14, 2024

HF-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 111 7 Updated Aug 2, 2024

Video+code lecture on building nanoGPT from scratch

Python 3,160 406 Updated Jul 26, 2024

llama3.cuda is a pure C/CUDA implementation for Llama 3 model.

Cuda 277 18 Updated Jun 4, 2024

Implementation for MatMul-free LM.

Python 2,782 169 Updated Jun 27, 2024

Tile primitives for speedy kernels

Cuda 1,426 53 Updated Aug 4, 2024

FlagGems is an operator library for large language models implemented in Triton Language.

Python 187 10 Updated Aug 2, 2024

Puzzles for learning Triton

Jupyter Notebook 902 54 Updated Jul 17, 2024

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

Python 13,659 901 Updated Aug 4, 2024

🐚 OpenDevin: Code Less, Make More

Python 29,410 3,398 Updated Aug 4, 2024

Sample codes for my CUDA programming book

Cuda 1,480 313 Updated Jul 27, 2023

LLM training in simple, raw C/CUDA

Cuda 22,448 2,495 Updated Aug 3, 2024

FlashInfer: Kernel Library for LLM Serving

Cuda 929 84 Updated Aug 4, 2024

⛄ Possibly the smallest compiler ever

JavaScript 27,714 2,838 Updated Feb 19, 2024
Next