Lists (3)
Sort Name ascending (A-Z)
Starred repositories
A large-scale simulation framework for LLM inference
🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with LlamaIndex, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
Open-source observability for your LLM application, based on OpenTelemetry
Efficient Triton Kernels for LLM Training
BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Run your own AI cluster at home with everyday devices 📱💻 🖥️⌚
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
SGLang is a fast serving framework for large language models and vision language models.
PyTorch native quantization and sparsity for training and inference
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
A model compilation solution for various hardware
A survey of Code Agents / Foundation Models for improving development productivity. Become 10x SWE, MLE, etc.
Run PyTorch LLMs locally on servers, desktop and mobile
TensorDict is a pytorch dedicated tensor container.
siliconflow / triton
Forked from triton-lang/tritonDevelopment repository for the Triton language and compiler
Shared Middle-Layer for Triton Compilation
Agentic components of the Llama Stack APIs
Applied AI experiments and examples for PyTorch
FB (Facebook) + GEMM (General Matrix-Matrix Multiplication) - https://code.fb.com/ml-applications/fbgemm/
This repository contains the experimental PyTorch native float8 training UX
搜索、推荐、广告、用增等工业界实践文章收集(来源:知乎、Datafuntalk、技术公众号)
A fast communication-overlapping library for tensor parallelism on GPUs.
QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving