📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

1,952 134 Updated Jul 12, 2024

jiazhihao / attention_superoptimizer

An Attention Superoptimizer

C++ 19 Updated May 9, 2024

SeoLabCornell / torch2chip

Torch2Chip (MLSys, 2024)

Python 42 3 Updated Jun 25, 2024

NVIDIA / deepops

Tools for building GPU clusters

Shell 1,231 316 Updated Mar 8, 2024

yhzhang0128 / egos-2000

Envision a world where EVERY student can read ALL the code of a teaching operating system.

C 2,162 154 Updated Jun 22, 2024

spcl / rFaaS

rFaaS: a high-performance FaaS platform with RDMA acceleration for low-latency invocations.

C++ 45 15 Updated Mar 17, 2024

deepseek-ai / DeepSeek-V2

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

2,945 108 Updated Jun 26, 2024

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 21,648 2,355 Updated Jul 12, 2024

microsoft / SuperScaler

An experimental parallel training platform

40 10 Updated Mar 25, 2024

InternLM / AcmeTrace

Jupyter Notebook 110 6 Updated Mar 12, 2024

joonspk-research / generative_agents

Generative Agents: Interactive Simulacra of Human Behavior

15,836 1,996 Updated Jun 3, 2024

openppl-public / ppl.nn

A primitive library for neural network

C++ 1,244 210 Updated Jul 10, 2024

apple / corenet

CoreNet: A library for training deep neural networks

Python 6,740 521 Updated May 28, 2024

microsoft / BitBLAS

BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.

Python 210 22 Updated Jul 11, 2024

uchuhimo / amanda

Python 13 1 Updated Apr 21, 2024

jzhang38 / TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Python 7,338 428 Updated May 3, 2024

Previous Next

Weihao Cui Raphael-Hao

Highlights

Block or report Raphael-Hao

Starred repositories

Linux