Stars
A throughput-oriented high-performance serving framework for LLMs
Outfit Anyone: Ultra-high quality virtual try-on for Any Clothing and Any Person
This repository is an official implementation of the paper "LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection".
DashInfer is a native LLM inference engine aiming to deliver industry-leading performance atop various hardware architectures, including x86 and ARMv9.
RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
[NeurIPS 2024] KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
MambaOut: Do We Really Need Mamba for Vision?
Inferflow is an efficient and highly configurable inference engine for large language models (LLMs).
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
PKU-DAIR / Hetu-Galvatron
Forked from AFDWang/Hetu-GalvatronGalvatron is an automatic distributed training system designed for Transformer models, including Large Language Models (LLMs).
[ECCV 2024] Taming Lookup Tables for Efficient Image Retouching
Official implementation of the CVPR 2024 paper ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions.
CAMixerSR: Only Details Need More “Attention” (CVPR 2024)
[CVPR 2024 Highlight] DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models
Effective Fusion Factor in FPN for Tiny Object Detection(WACV2021)
[NeurIPS 2022] HorNet: Efficient High-Order Spatial Interactions with Recursive Gated Convolutions
Official code for Conformer: Local Features Coupling Global Representations for Visual Recognition