Highlights
- Pro
Block or Report
Block or report wxd000000
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
📰 Must-read papers and blogs on Speculative Decoding ⚡️
Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.
Firefly: 大模型训练工具,支持训练Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型
🎉CUDA/C++ 笔记 / 大模型手撕CUDA / 技术博客,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
The road to hack SysML and become an system expert
Systems design is the process of defining the architecture, modules, interfaces, and data for a system to satisfy specified requirements. Systems design could be seen as the application of systems …
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
FlashInfer: Kernel Library for LLM Serving
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
SGLang is yet another fast serving framework for large language models and vision language models.
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Transformer related optimization, including BERT, GPT
[TMLR 2024] Efficient Large Language Models: A Survey
Fast and memory-efficient exact attention
Awesome LLM compression research papers and tools.
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Learning material for CMU10-714: Deep Learning System
欢迎来到 "LLM-travel" 仓库!探索大语言模型(LLM)的奥秘 🚀。致力于深入理解、探讨以及实现与大模型相关的各种技术、原理和应用。
Development repository for the Triton language and compiler
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads