Highlights
- Pro
Stars
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
Pytorch implementation of Transfusion, "Predict the Next Token and Diffuse Images with One Multi-Modal Model", from MetaAI
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
Helpful tools and examples for working with flex-attention
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
Simple implementation of muP, based on Spectral Condition for Feature Learning. The implementation is SGD only, dont use it for Adam
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
InternEvo is an open-sourced lightweight training framework aims to support model pre-training without the need for extensive dependencies.
A modular graph-based Retrieval-Augmented Generation (RAG) system
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
User-friendly WebUI for AI (Formerly Ollama WebUI)
Zhejiang University Graduation Thesis LaTeX Template
A collection of AWESOME things about mixture-of-experts
Ongoing research training transformer models at scale
Minimal implementation of scalable rectified flow transformers, based on SD3's approach
A playbook for systematically maximizing the performance of deep learning models.
A byte-level decoder architecture that matches the performance of tokenized Transformers.
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
Efficient implementations of state-of-the-art linear attention models in Pytorch and Triton
A Native-PyTorch Library for LLM Fine-tuning
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
PyContinual (An Easy and Extendible Framework for Continual Learning)
An Extensible Continual Learning Framework Focused on Language Models (LMs)