![awesome logo](https://raw.githubusercontent.com/github/explore/80688e429a7d4ef2fca1e82350fe8e3517d3494d/topics/awesome/awesome.png)
-
SiliconFlow
- Neverland
- https://mard1no.github.io/
Block or Report
Block or report MARD1NO
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLists (1)
Sort Name ascending (A-Z)
Language
Sort by: Recently starred
Starred repositories
SGLang is yet another fast serving framework for large language models and vision language models.
Utilities intended for use with Llama models.
QQQ is an innovative and hardware-optimized W4A8 quantization solution.
AMD’s C++ library for accelerating tensor primitives
Debug print operator for cudagraph debugging
ms-swift: Use PEFT or Full-parameter to finetune 300+ LLMs or 50+ MLLMs. (Qwen2, GLM4v, Internlm2.5, Yi, Llama3.1, Llava-Video, Internvl2, MiniCPM-V, Deepseek, Baichuan2, Gemma2, Phi3-Vision, ...)
Integrate MS-AMP into nanoGPT (https://github.com/karpathy/nanoGPT)
Ongoing research training transformer models at scale
fanshiqing / grouped_gemm
Forked from tgale96/grouped_gemmPyTorch bindings for CUTLASS grouped GEMM.
Finetune Llama 3.1, Mistral, Phi & Gemma LLMs 2-5x faster with 80% less memory
Fast and memory-efficient exact attention
This is a cross-chip platform collection of operators and a unified neural network library.
BizyAir: Comfy Nodes that can run in any environment.
A fast communication-overlapping library for tensor parallelism on GPUs.
NVIDIA Math Libraries for the Python Ecosystem
Efficient operation implementation based on the Cambricon Machine Learning Unit (MLU) .
PaddleAPEX:Paddle Accuracy and Performance EXpansion pack
A large-scale simulation framework for LLM inference
YaRN: Efficient Context Window Extension of Large Language Models
Shared Middle-Layer for Triton Compilation