TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 8,312 931 Updated Oct 1, 2024

YuchuanTian / RethinkTinyLM

[ICML'24] The official implementation of “Rethinking Optimization and Architecture for Tiny Language Models”

Python 114 6 Updated Jul 8, 2024

pytorch-labs / gpt-fast

Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.

Python 5,561 507 Updated Sep 26, 2024

microsoft / mup

maximal update parametrization (µP)

Jupyter Notebook 1,364 93 Updated Jul 17, 2024

SafeAILab / EAGLE

Official Implementation of EAGLE-1 (ICML'24) and EAGLE-2 (EMNLP'24)

Python 780 80 Updated Sep 27, 2024

hao-ai-lab / LookaheadDecoding

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Python 1,111 65 Updated Feb 14, 2024

Jamie-Stirling / RetNet

An implementation of "Retentive Network: A Successor to Transformer for Large Language Models"

Python 1,160 100 Updated Oct 22, 2023

huggingface / optimum

🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

Python 2,495 447 Updated Sep 30, 2024

microsoft / LMOps

General technology for enabling AI capabilities w/ LLMs and MLLMs

Python 3,591 273 Updated Sep 30, 2024

dilab-zju / self-speculative-decoding

Code associated with the paper **Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding**

Jupyter Notebook 131 8 Updated May 24, 2024

FasterDecoding / REST

REST: Retrieval-Based Speculative Decoding, NAACL 2024

C 163 10 Updated Sep 25, 2024

YellowOldOdd / SDBI

Simple Dynamic Batching Inference

Python 145 17 Updated Mar 8, 2022

mosaicml / composer

Supercharge Your Model Training

Python 5,125 415 Updated Oct 1, 2024

princeton-nlp / LLM-Shearing

[ICLR 2024] Sheared LLaMA: Accelerating Language Model Pre-training via Structured Pruning

Python 544 42 Updated Mar 4, 2024

qizhuocai / FCEmulator_Qizhuo

FC游戏模拟器（包括FC(Nes), GG, GBC）

C++ 88 38 Updated Jun 24, 2021

InternLM / InternLM

Official release of InternLM2.5 base and chat models. 1M context support

Python 6,289 441 Updated Sep 6, 2024

FasterDecoding / Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,229 153 Updated Jun 25, 2024

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 13,612 1,246 Updated Oct 2, 2024

ggerganov / llama.cpp

LLM inference in C/C++

C++ 65,707 9,432 Updated Oct 2, 2024

NVIDIA-AI-IOT / torch2trt

An easy to use PyTorch to TensorRT converter

Python 4,575 675 Updated Aug 17, 2024

huggingface / text-generation-inference

Large Language Model Text Generation Inference

Python 8,848 1,036 Updated Oct 2, 2024

QSCTech / zju-icicles

浙江大学课程攻略共享计划

HTML 37,065 9,416 Updated Sep 29, 2024

Yunsheng Ni niyunsheng

Lists (28)

3d-photo

AIGC-2D

AIGC-3D

ALgorithm

BigModel

C++Dev

Deepfake

diffusion models

Games

image generate

inpainting

LearningBook

LLM

Mesh reconstruction

Mesh tools

model-compression-acceleration

Model-Inference

Nerf

Optical flow

PBR

Segmentation

SR

Stereo Matching

Style transfer

Texture process

Tools

transformer

tutorials

Starred repositories

3d-reconstruction