Skip to content
View dmarx's full-sized avatar

Organizations

@pytti-tools

Block or report dmarx

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

ML Performance

638 repositories

(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis

Python 466 25 Updated Sep 27, 2024

MTEB: Massive Text Embedding Benchmark

Jupyter Notebook 1,861 250 Updated Oct 10, 2024

Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs

Python 69 5 Updated Jun 14, 2024
Python 43 4 Updated Jun 14, 2024

Official implementation of ⚡ Flash Diffusion ⚡: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation

Python 459 33 Updated Jul 3, 2024

UNet diffusion model in pure CUDA

Cuda 566 28 Updated Jun 28, 2024

LLM training in simple, raw C/CUDA

Cuda 84 6 Updated May 1, 2024

LOMO: LOw-Memory Optimization

Python 976 68 Updated Jul 2, 2024

This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.

Python 58 3 Updated Jul 1, 2024

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters

Python 567 50 Updated Oct 9, 2024

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 549 45 Updated Oct 10, 2024

Odysseus: Playground of LLM Sequence Parallelism

Python 50 1 Updated Jun 17, 2024

Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.

Python 1,156 76 Updated Oct 10, 2024

OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training

Python 239 18 Updated Sep 26, 2024

NVIDIA Math Libraries for the Python Ecosystem

Cython 200 9 Updated Jul 8, 2024

MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.

Python 952 52 Updated Sep 3, 2024

Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, …

C 923 52 Updated Oct 9, 2024

Modin: Scale your Pandas workflows by changing a single line of code

Python 9,831 651 Updated Sep 21, 2024

Enjoy the magic of Diffusion models!

Python 6,413 577 Updated Oct 10, 2024

IREE's PyTorch Frontend, based on Torch Dynamo.

Python 46 23 Updated Oct 10, 2024

Real-time neural network inferencing

C++ 582 58 Updated Sep 30, 2024

Scale LLM Engine public repository

Python 776 54 Updated Oct 10, 2024

Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.

Python 84 8 Updated Aug 9, 2024

Large-scale LLM inference engine

Python 1,040 116 Updated Oct 9, 2024

A low-latency & high-throughput serving engine for LLMs

Python 203 27 Updated Sep 12, 2024

Whisper with Medusa heads

Python 791 49 Updated Sep 30, 2024

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

Python 233 91 Updated Oct 8, 2024

Simple and fast low-bit matmul kernels in CUDA / Triton

Python 98 7 Updated Oct 10, 2024

A fast multi-core implementation of HDBSCAN for low dimensional Euclidean spaces

Python 90 8 Updated Oct 1, 2024

Infinity is a high-throughput, low-latency REST API for serving text-embeddings, reranking models and clip

Python 1,346 100 Updated Oct 10, 2024