dvmazur

🍋

Denis Mazur dvmazur

🍋

switch stance, face plant; napalm fire starters; flatlander, plug puller; nose dive turbulence chartered

114 followers · 38 following

Moscow, Russia
@dvmazur

Achievements

x2 x3 x2

Achievements

x2 x3 x2

Starred repositories

shawntan / scattermoe

Triton-based implementation of Sparse Mixture of Experts.

Python 184 14 Updated Oct 10, 2024

python-trio / trio

Trio – a friendly Python library for async concurrency and I/O

Python 6,188 340 Updated Nov 5, 2024

pjlab-sys4nlp / llama-moe

⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)

Python 879 46 Updated Jun 25, 2024

PrimeIntellect-ai / prime

prime (previously called ZeroBand) is a framework for efficient, globally distributed training of AI models over the internet.

Python 202 22 Updated Nov 6, 2024

google / nsync

nsync is a C library that exports various synchronization primitives, such as mutexes

C 1,061 83 Updated Jul 23, 2024

unclecode / crawl4ai

🔥🕷️ Crawl4AI: Open-source LLM Friendly Web Crawler & Scrapper

Python 15,566 1,120 Updated Nov 6, 2024

NVlabs / tiny-cuda-nn

Lightning fast C++/CUDA neural network framework

C++ 3,745 454 Updated Aug 26, 2024

kyutai-labs / moshi

Python 6,642 506 Updated Oct 31, 2024

NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Python 8,389 1,396 Updated Nov 1, 2024

linkedin / Liger-Kernel

Efficient Triton Kernels for LLM Training

Python 3,376 190 Updated Nov 6, 2024

galqiwi / fair-p

Go 3 Updated Aug 22, 2024

bytedance / flux

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 216 16 Updated Oct 30, 2024

KernelTuner / kernel_tuner

Kernel Tuner

Python 285 49 Updated Nov 5, 2024

gpu-mode / resource-stream

GPU programming related news and material links

1,202 73 Updated Sep 23, 2024

rwitten / HighPerfLLMs2024

Python 222 20 Updated Jul 11, 2024

yandex / YaFSDP

YaFSDP: Yet another Fully Sharded Data Parallel

Python 843 42 Updated Sep 3, 2024

b4rtaz / distributed-llama

Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.

C++ 1,471 102 Updated Oct 14, 2024

IST-DASLab / marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 610 47 Updated Sep 4, 2024

ItzCrazyKns / Perplexica

Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI

TypeScript 14,515 1,399 Updated Oct 31, 2024

langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications

Jupyter Notebook 94,444 15,265 Updated Nov 6, 2024

reworkd / tarsier

Vision utilities for web interaction agents 👀

Jupyter Notebook 1,430 87 Updated Nov 4, 2024

meta-llama / llama3

The official Meta Llama 3 GitHub site

Python 26,980 3,056 Updated Aug 12, 2024

AlexanderKoch-Koch / low_cost_robot

Python 3,058 264 Updated Sep 21, 2024

efeslab / fiddler

Fast Inference of MoE Models with CPU-GPU Orchestration

Python 170 16 Updated Oct 30, 2024

S-LoRA / S-LoRA

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Python 1,744 95 Updated Jan 21, 2024

galqiwi / adhd_helper

Go 4 Updated Feb 18, 2024

huggingface / nanotron

Minimalistic large language model 3D-parallelism training

Python 1,218 120 Updated Nov 4, 2024

yandexdataschool / nlp_course

YSDA course in Natural Language Processing

Jupyter Notebook 9,792 2,593 Updated Nov 2, 2024

Vahe1994 / AQLM

Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Ext…

Python 1,166 175 Updated Nov 5, 2024