Highlights
- Pro
Block or Report
Block or report wxd000000
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
Firefly: 大模型训练工具,支持训练Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型
🎉CUDA 笔记 / 大模型手撕CUDA / C++笔记,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.
1 min voice data can also be used to train a good TTS model! (few shot voice cloning)
OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
The road to hack SysML and become an system expert
Systems design is the process of defining the architecture, modules, interfaces, and data for a system to satisfy specified requirements. Systems design could be seen as the application of systems …
🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production
FlashInfer: Kernel Library for LLM Serving
S-LoRA: Serving Thousands of Concurrent LoRA Adapters
A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…
SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.
MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.
Transformer related optimization, including BERT, GPT
[TMLR 2024] Efficient Large Language Models: A Survey
Fast and memory-efficient exact attention
GPT4All: Chat with Local LLMs on Any Device
Awesome LLM compression research papers and tools.
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
Learning material for CMU10-714: Deep Learning System
欢迎来到 "LLM-travel" 仓库!探索大语言模型(LLM)的奥秘 🚀。致力于深入理解、探讨以及实现与大模型相关的各种技术、原理和应用。
Development repository for the Triton language and compiler
Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads
为 Eijhout 教授的Introduction to HPC提供中文翻译、 PPT和Lab。
Ongoing research training transformer models at scale