Skip to content
View MARD1NO's full-sized avatar
🎯
Focusing
🎯
Focusing
Block or Report

Block or report MARD1NO

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.

Starred repositories

Showing results
C++ 113 38 Updated Jul 30, 2024
4 Updated Jul 25, 2024

SGLang is yet another fast serving framework for large language models and vision language models.

Python 3,537 219 Updated Jul 30, 2024

Utilities intended for use with Llama models.

Python 2,875 390 Updated Jul 30, 2024

The Tensor (or Array)

Python 280 21 Updated Jul 30, 2024

QQQ is an innovative and hardware-optimized W4A8 quantization solution.

Python 32 2 Updated Jul 24, 2024

AMD’s C++ library for accelerating tensor primitives

C++ 33 16 Updated Jul 29, 2024

Debug print operator for cudagraph debugging

Cuda 7 Updated Jul 29, 2024

RAND library for HIP programming language

C++ 107 66 Updated Jul 26, 2024

ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)

Python 2,619 236 Updated Jul 30, 2024

Integrate MS-AMP into nanoGPT (https://github.com/karpathy/nanoGPT)

Python 1 Updated Jul 19, 2024

Ongoing research training transformer models at scale

Python 9,537 2,154 Updated Jul 30, 2024

PyTorch bindings for CUTLASS grouped GEMM.

Cuda 39 18 Updated Jul 18, 2024

KvikIO - High Performance File IO

Python 134 48 Updated Jul 30, 2024

Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory

Python 13,341 883 Updated Jul 30, 2024

Fast and memory-efficient exact attention

Python 1 Updated Jul 16, 2024

This is a cross-chip platform collection of operators and a unified neural network library.

Python 12 1 Updated Nov 3, 2023

BizyAir: Comfy Nodes that can run in any environment.

Python 100 8 Updated Jul 30, 2024

Brand new TTS solution

Python 6,618 515 Updated Jul 30, 2024

A fast communication-overlapping library for tensor parallelism on GPUs.

C++ 104 9 Updated Jul 25, 2024

NVIDIA Math Libraries for the Python Ecosystem

Cython 185 6 Updated Jul 8, 2024

Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .

C++ 94 102 Updated Jul 30, 2024

CloudAI Benchmark Framework

Python 21 11 Updated Jul 30, 2024

PaddleAPEX:Paddle Accuracy and Performance EXpansion pack

Python 7 5 Updated Jul 22, 2024
2 Updated Jul 29, 2024

A large-scale simulation framework for LLM inference

Python 162 18 Updated Jul 28, 2024

YaRN: Efficient Context Window Extension of Large Language Models

Python 1,272 112 Updated Apr 17, 2024

LLM101n: Let's build a Storyteller

26,092 1,390 Updated Jul 29, 2024

Shared Middle-Layer for Triton Compilation

MLIR 141 27 Updated Jul 29, 2024
Next