dmarx

David Marx dmarx

Engineer / Machine Learning Researcher interested in deep learning, probabilistic ML, generative models, multi-modal SSL, visual understanding, geometric

484 followers · 342 following

Stability.ai, Eleuther.ai
Seattle, WA
https://dmarx.github.io
@DigThatData

Achievements

x3 x3

Achievements

x3 x3

Organizations

Stars

ML Performance

638 repositories

tianweiy / DMD2

(NeurIPS 2024 Oral 🔥) Improved Distribution Matching Distillation for Fast Image Synthesis

Python 466 25 Updated Sep 27, 2024

embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark

Jupyter Notebook 1,861 250 Updated Oct 10, 2024

pprp / Pruner-Zero

Pruner-Zero: Evolving Symbolic Pruning Metric from scratch for LLMs

Python 69 5 Updated Jun 14, 2024

yilundu / ired_code_release

Python 43 4 Updated Jun 14, 2024

gojasper / flash-diffusion

Official implementation of ⚡ Flash Diffusion ⚡: Accelerating Any Conditional Diffusion Model for Few Steps Image Generation

Python 459 33 Updated Jul 3, 2024

clu0 / unet.cu

UNet diffusion model in pure CUDA

Cuda 566 28 Updated Jun 28, 2024

gevtushenko / llm.c

LLM training in simple, raw C/CUDA

Cuda 84 6 Updated May 1, 2024

OpenLMLab / LOMO

LOMO: LOw-Memory Optimization

Python 976 68 Updated Jul 2, 2024

BorealisAI / flora-opt

This is the official repository for the paper "Flora: Low-Rank Adapters Are Secretly Gradient Compressors" in ICML 2024.

Python 58 3 Updated Jul 1, 2024

xdit-project / xDiT

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters

Python 567 50 Updated Oct 9, 2024

vllm-project / llm-compressor

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

Python 549 45 Updated Oct 10, 2024

feifeibear / Odysseus-Transformer

Odysseus: Playground of LLM Sequence Parallelism

Python 50 1 Updated Jun 17, 2024

Lightning-AI / lightning-thunder

Make PyTorch models up to 40% faster! Thunder is a source to source compiler for PyTorch. It enables using different hardware executors at once; across one or thousands of GPUs.

Python 1,156 76 Updated Oct 10, 2024

PrimeIntellect-ai / OpenDiloco

OpenDiLoCo: An Open-Source Framework for Globally Distributed Low-Communication Training

Python 239 18 Updated Sep 26, 2024

NVIDIA / nvmath-python

NVIDIA Math Libraries for the Python Ecosystem

Cython 200 9 Updated Jul 8, 2024

facebookresearch / MobileLLM

MobileLLM Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In ICML 2024.

Python 952 52 Updated Sep 3, 2024

ashvardanian / SimSIMD

Up to 200x Faster Dot Products & Similarity Metrics — for Python, Rust, C, JS, and Swift, supporting f64, f32, f16 real & complex, i8, and bit vectors using SIMD for both AVX2, AVX-512, NEON, SVE, …

C 923 52 Updated Oct 9, 2024

modin-project / modin

Modin: Scale your Pandas workflows by changing a single line of code

Python 9,831 651 Updated Sep 21, 2024

modelscope / DiffSynth-Studio

Enjoy the magic of Diffusion models!

Python 6,413 577 Updated Oct 10, 2024

iree-org / iree-turbine

IREE's PyTorch Frontend, based on Torch Dynamo.

Python 46 23 Updated Oct 10, 2024

jatinchowdhury18 / RTNeural

Real-time neural network inferencing

C++ 582 58 Updated Sep 30, 2024

scaleapi / llm-engine

Scale LLM Engine public repository

Python 776 54 Updated Oct 10, 2024

AnswerDotAI / cold-compress

Cold Compress is a hackable, lightweight, and open-source toolkit for creating and benchmarking cache compression methods built on top of GPT-Fast, a simple, PyTorch-native generation codebase.

Python 84 8 Updated Aug 9, 2024

PygmalionAI / aphrodite-engine

Large-scale LLM inference engine

Python 1,040 116 Updated Oct 9, 2024

microsoft / sarathi-serve

A low-latency & high-throughput serving engine for LLMs

Python 203 27 Updated Sep 12, 2024

aiola-lab / whisper-medusa

Whisper with Medusa heads

Python 791 49 Updated Sep 30, 2024

runpod-workers / worker-vllm

The RunPod worker template for serving our large language model endpoints. Powered by vLLM.

Python 233 91 Updated Oct 8, 2024

mobiusml / gemlite

Simple and fast low-bit matmul kernels in CUDA / Triton

Python 98 7 Updated Oct 10, 2024

TutteInstitute / fast_hdbscan

A fast multi-core implementation of HDBSCAN for low dimensional Euclidean spaces

Python 90 8 Updated Oct 1, 2024

michaelfeil / infinity

Infinity is a high-throughput, low-latency REST API for serving text-embeddings, reranking models and clip

Python 1,346 100 Updated Oct 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

David Marx dmarx

Achievements

Achievements

Organizations

Block or report dmarx

ML Performance

tianweiy / DMD2

embeddings-benchmark / mteb

pprp / Pruner-Zero

yilundu / ired_code_release

gojasper / flash-diffusion

clu0 / unet.cu

gevtushenko / llm.c

OpenLMLab / LOMO

BorealisAI / flora-opt

xdit-project / xDiT

vllm-project / llm-compressor

feifeibear / Odysseus-Transformer

Lightning-AI / lightning-thunder

PrimeIntellect-ai / OpenDiloco

NVIDIA / nvmath-python

facebookresearch / MobileLLM

ashvardanian / SimSIMD

modin-project / modin

modelscope / DiffSynth-Studio

iree-org / iree-turbine

jatinchowdhury18 / RTNeural

scaleapi / llm-engine

AnswerDotAI / cold-compress

PygmalionAI / aphrodite-engine

microsoft / sarathi-serve

aiola-lab / whisper-medusa

runpod-workers / worker-vllm

mobiusml / gemlite

TutteInstitute / fast_hdbscan

michaelfeil / infinity