-
Stability.ai, Eleuther.ai
- Seattle, WA
- https://dmarx.github.io
- @DigThatData
ML Performance
(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis
MTEB: Massive Text Embedding Benchmark
Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs
Official implementation of ⚡ Flash Diffusion ⚡: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation
This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Odysseus: Playground of LLM Sequence Parallelism
Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.
OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training
NVIDIA Math Libraries for the Python Ecosystem
MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.
Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, …
Modin: Scale your Pandas workflows by changing a single line of code
Enjoy the magic of Diffusion models!
IREE's PyTorch Frontend, based on Torch Dynamo.
Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.
Large-scale LLM inference engine
A low-latency & high-throughput serving engine for LLMs
The RunPod worker template for serving our large language model endpoints. Powered by vLLM.
Simple and fast low-bit matmul kernels in CUDA / Triton
A fast multi-core implementation of HDBSCAN for low dimensional Euclidean spaces
Infinity is a high-throughput, low-latency REST API for serving text-embeddings, reranking models and clip