Skip to content
View dvmazur's full-sized avatar
🍋
🍋

Block or report dvmazur

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

Triton-based implementation of Sparse Mixture of Experts.

Python 184 14 Updated Oct 10, 2024

Trio – a friendly Python library for async concurrency and I/O

Python 6,188 340 Updated Nov 5, 2024

⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training (EMNLP 2024)

Python 879 46 Updated Jun 25, 2024

prime (previously called ZeroBand) is a framework for efficient, globally distributed training of AI models over the internet.

Python 202 22 Updated Nov 6, 2024

nsync is a C library that exports various synchronization primitives, such as mutexes

C 1,061 83 Updated Jul 23, 2024

🔥🕷️ Crawl4AI: Open-source LLM Friendly Web Crawler & Scrapper

Python 15,566 1,120 Updated Nov 6, 2024

Lightning fast C++/CUDA neural network framework

C++ 3,745 454 Updated Aug 26, 2024
Python 6,642 506 Updated Oct 31, 2024

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Python 8,389 1,396 Updated Nov 1, 2024

Efficient Triton Kernels for LLM Training

Python 3,376 190 Updated Nov 6, 2024
Go 3 Updated Aug 22, 2024

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 216 16 Updated Oct 30, 2024

Kernel Tuner

Python 285 49 Updated Nov 5, 2024

GPU programming related news and material links

1,202 73 Updated Sep 23, 2024
Python 222 20 Updated Jul 11, 2024

YaFSDP: Yet another Fully Sharded Data Parallel

Python 843 42 Updated Sep 3, 2024

Tensor parallelism is all you need. Run LLMs on an AI cluster at home using any device. Distribute the workload, divide RAM usage, and increase inference speed.

C++ 1,471 102 Updated Oct 14, 2024

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.

Python 610 47 Updated Sep 4, 2024

Perplexica is an AI-powered search engine. It is an Open source alternative to Perplexity AI

TypeScript 14,515 1,399 Updated Oct 31, 2024

🦜🔗 Build context-aware reasoning applications

Jupyter Notebook 94,444 15,265 Updated Nov 6, 2024

Vision utilities for web interaction agents 👀

Jupyter Notebook 1,430 87 Updated Nov 4, 2024

The official Meta Llama 3 GitHub site

Python 26,980 3,056 Updated Aug 12, 2024

Fast Inference of MoE Models with CPU-GPU Orchestration

Python 170 16 Updated Oct 30, 2024

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Python 1,744 95 Updated Jan 21, 2024
Go 4 Updated Feb 18, 2024

Minimalistic large language model 3D-parallelism training

Python 1,218 120 Updated Nov 4, 2024

YSDA course in Natural Language Processing

Jupyter Notebook 9,792 2,593 Updated Nov 2, 2024

Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf and PV-Tuning: Beyond Straight-Through Estimation for Ext…

Python 1,166 175 Updated Nov 5, 2024

Finetune Llama 3.2, Mistral, Phi, Qwen & Gemma LLMs 2-5x faster with 80% less memory

Python 17,822 1,236 Updated Nov 6, 2024
Next