DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including x86 and ARMv9.

C++ 135 15 Updated Aug 27, 2024

alibaba / rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

C++ 539 49 Updated Oct 14, 2024

xdit-project / xDiT

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Python 655 53 Updated Nov 5, 2024

SqueezeAILab / KVQuant

[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization

Python 302 25 Updated Aug 13, 2024

yuweihao / MambaOut

MambaOut: Do We Really Need Mamba for Vision?

Python 2,023 34 Updated Oct 22, 2024

inferflow / inferflow

Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).

C++ 236 24 Updated Mar 15, 2024

OpenPPL / ppl.nn

A primitive library for neural network

C++ 1,290 215 Updated Nov 5, 2024

DefTruth / Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,771 192 Updated Nov 1, 2024

hao-ai-lab / LookaheadDecoding

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Python 1,140 67 Updated Oct 14, 2024

MegEngine / InferLLM

a lightweight LLM model inference framework

C++ 699 87 Updated Apr 7, 2024

PKU-DAIR / Hetu-Galvatron

Forked from AFDWang/Hetu-Galvatron

Galvatron is an automatic distributed training system designed for Transformer models, including Large Language Models (LLMs).

Python 34 3 Updated Nov 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

zhangxs janicevidal

Block or report janicevidal

Stars

efeslab / Nanoflow

poppuppy / alpha-free-matting

luchangli03 / onnxsim_large_model

HumanAIGC / OutfitAnyone

cszzshi / SimD

LilianHollard / LeYOLO

Atten4Vis / LW-DETR

modelscope / dash-infer

alibaba / rtp-llm

xdit-project / xDiT

SqueezeAILab / KVQuant

yuweihao / MambaOut

inferflow / inferflow

OpenPPL / ppl.nn

DefTruth / Awesome-LLM-Inference

hao-ai-lab / LookaheadDecoding

MegEngine / InferLLM

PKU-DAIR / Hetu-Galvatron

zhuqinfeng1999 / Samba

Stephen0808 / ICELUT

BBuf / tvm_mlir_learn

Traffic-X / ViT-CoMer

icandle / CAMixerSR

mit-han-lab / distrifuser

ggerganov / llama.cpp

ucas-vg / Effective-Fusion-Factor

raoyongming / HorNet

pengzhiliang / Conformer

cv516Buaa / tph-yolov5

karpathy / llama2.c