Stars
Framework to reduce autotune overhead to zero for well known deployments.
Simple and fast low-bit matmul kernels in CUDA / Triton
SGLang is a fast serving framework for large language models and vision language models.
Efficient Triton Kernels for LLM Training
Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.
A large-scale information-rich web dataset, featuring millions of real clicked query-document labels
Execute a jupyter notebook, fast, without needing jupyter
scripts and baselines for Spider: Yale complex and cross-domain semantic parsing and text-to-SQL challenge
BitBLAS is a library to support mixed-precision matrix multiplications, especially for quantized LLM deployment.
Shoggoth is a peer-to-peer network for publishing and distributing open-source Artificial Intelligence
Math OCR model that outputs LaTeX and markdown
Convert PDF to markdown quickly with high accuracy
Official implementation of Half-Quadratic Quantization (HQQ)
A curated list of awesome Mojo 🔥 frameworks, libraries, software and resources
Create Open XML PowerPoint documents in Python
OpenChat: Advancing Open-source Language Models with Imperfect Data
Fast, Modern, Memory Efficient, and Low Precision PyTorch Optimizers
Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting…
🛰️ An approximate nearest-neighbor search library for Python and Java with a focus on ease of use, simplicity, and deployability.
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
A collection of datasets that pair questions with SQL queries.
A programming framework for agentic AI 🤖
Large Language Model Text Generation Inference