Block or Report
Block or report garrett4wade
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
FlashInfer: Kernel Library for LLM Serving
Official inference library for Mistral models
An extremely fast Python linter and code formatter, written in Rust.
Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR.
A library to analyze PyTorch traces.
Super-Efficient RLHF Training of LLMs with Parameter Reallocation
🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
Simple and efficient pytorch-native transformer text generation in <1000 LOC of python.
Taming Transformers for High-Resolution Image Synthesis
Official Jax Implementation of MaskGIT
Pandora: Towards General World Model with Natural Language Actions and Video States
Profiling and inspecting memory in pytorch
Easy Parallel Library (EPL) is a general and efficient deep learning framework for distributed model training.
An application-focused API for memory management on NUMA & GPU architectures
Task-based datasets, preprocessing, and evaluation for sequence models.
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
A retargetable MLIR-based machine learning compiler and runtime toolkit.
A fast implementation of T5/UL2 in PyTorch using Flash Attention
Building a quick conversation-based search demo with Lepton AI.
Open standard for machine learning interoperability
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
StreamDiffusion: A Pipeline-Level Solution for Real-Time Interactive Generation