Skip to content
View wxd000000's full-sized avatar

Highlights

  • Pro
Block or Report

Block or report wxd000000

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

Jupyter Notebook 35,122 3,691 Updated Jul 28, 2024

📰 Must-read papers and blogs on Speculative Decoding ⚡️

291 12 Updated Aug 5, 2024

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

948 20 Updated Jul 31, 2024

Firefly: 大模型训练工具,支持训练Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型

Python 5,404 487 Updated Jul 16, 2024

🎉CUDA/C++ 笔记 / 大模型手撕CUDA / 技术博客,更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

Cuda 946 91 Updated Jul 29, 2024

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 30,250 3,476 Updated Aug 4, 2024

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

C++ 5,818 661 Updated Aug 3, 2024

The road to hack SysML and become an system expert

Emacs Lisp 389 49 Updated Aug 4, 2024

Systems design is the process of defining the architecture, modules, interfaces, and data for a system to satisfy specified requirements. Systems design could be seen as the application of systems …

Shell 4,877 1,375 Updated Mar 6, 2024

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 32,462 3,907 Updated Jul 25, 2024

FlashInfer: Kernel Library for LLM Serving

Cuda 931 84 Updated Aug 4, 2024

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Python 1,658 85 Updated Jan 21, 2024

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 1,704 280 Updated Aug 5, 2024

SGLang is yet another fast serving framework for large language models and vision language models.

Python 3,854 234 Updated Aug 5, 2024

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Python 1,798 169 Updated Aug 5, 2024

Transformer related optimization, including BERT, GPT

C++ 5,700 879 Updated Mar 27, 2024

[TMLR 2024] Efficient Large Language Models: A Survey

885 75 Updated Aug 2, 2024

Fast and memory-efficient exact attention

Python 12,758 1,143 Updated Aug 2, 2024

GPT4All: Chat with Local LLMs on Any Device

C++ 68,288 7,490 Updated Aug 5, 2024

Official Implementation of EAGLE-1 and EAGLE-2

Python 688 69 Updated Jul 30, 2024

Awesome LLM compression research papers and tools.

936 55 Updated Aug 2, 2024

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 7,798 851 Updated Aug 2, 2024

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,130 145 Updated Aug 5, 2024

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 19,307 2,456 Updated Jul 15, 2024

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Jupyter Notebook 24,020 3,162 Updated Jul 23, 2024

Learning material for CMU10-714: Deep Learning System

Jupyter Notebook 182 29 Updated May 12, 2024

欢迎来到 "LLM-travel" 仓库!探索大语言模型(LLM)的奥秘 🚀。致力于深入理解、探讨以及实现与大模型相关的各种技术、原理和应用。

Jupyter Notebook 233 28 Updated Jul 21, 2024

Development repository for the Triton language and compiler

C++ 12,160 1,458 Updated Aug 5, 2024

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,074 135 Updated Jun 25, 2024
Next