wxd000000

wxd000000

6 followers · 16 following

Achievements

Highlights

Block or Report

Block or report wxd000000

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Stars

mlabonne / llm-course

Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.

Jupyter Notebook 35,122 3,691 Updated Jul 28, 2024

hemingkx / SpeculativeDecodingPapers

📰 Must-read papers and blogs on Speculative Decoding ⚡️

291 12 Updated Aug 5, 2024

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

948 20 Updated Jul 31, 2024

yangjianxin1 / Firefly

Firefly: 大模型训练工具，支持训练Qwen2、Yi1.5、Phi-3、Llama3、Gemma、MiniCPM、Yi、Deepseek、Orion、Xverse、Mixtral-8x7B、Zephyr、Mistral、Baichuan2、Llma2、Llama、Qwen、Baichuan、ChatGLM2、InternLM、Ziya2、Vicuna、Bloom等大模型

Python 5,404 487 Updated Jul 16, 2024

DefTruth / CUDA-Learn-Notes

🎉CUDA/C++ 笔记 / 大模型手撕CUDA / 技术博客，更新随缘: flash_attn、sgemm、sgemv、warp reduce、block reduce、dot product、elementwise、softmax、layernorm、rmsnorm、hist etc.

Cuda 946 91 Updated Jul 29, 2024

RVC-Boss / GPT-SoVITS

1 min voice data can also be used to train a good TTS model! (few shot voice cloning)

Python 30,250 3,476 Updated Aug 4, 2024

Oneflow-Inc / oneflow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

C++ 5,818 661 Updated Aug 3, 2024

Jack47 / hack-SysML

The road to hack SysML and become an system expert

Emacs Lisp 389 49 Updated Aug 4, 2024

Jeevan-kumar-Raj / Grokking-System-Design

Systems design is the process of defining the architecture, modules, interfaces, and data for a system to satisfy specified requirements. Systems design could be seen as the application of systems …

Shell 4,877 1,375 Updated Mar 6, 2024

coqui-ai / TTS

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 32,462 3,907 Updated Jul 25, 2024

flashinfer-ai / flashinfer

FlashInfer: Kernel Library for LLM Serving

Cuda 931 84 Updated Aug 4, 2024

S-LoRA / S-LoRA

S-LoRA: Serving Thousands of Concurrent LoRA Adapters

Python 1,658 85 Updated Jan 21, 2024

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 1,704 280 Updated Aug 5, 2024

sgl-project / sglang

SGLang is yet another fast serving framework for large language models and vision language models.

Python 3,854 234 Updated Aug 5, 2024

microsoft / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Python 1,798 169 Updated Aug 5, 2024

NVIDIA / FasterTransformer

Transformer related optimization, including BERT, GPT

C++ 5,700 879 Updated Mar 27, 2024

AIoT-MLSys-Lab / Efficient-LLMs-Survey

[TMLR 2024] Efficient Large Language Models: A Survey

885 75 Updated Aug 2, 2024

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 12,758 1,143 Updated Aug 2, 2024

nomic-ai / gpt4all

GPT4All: Chat with Local LLMs on Any Device

C++ 68,288 7,490 Updated Aug 5, 2024

SafeAILab / EAGLE

Official Implementation of EAGLE-1 and EAGLE-2

Python 688 69 Updated Jul 30, 2024

HuangOwen / Awesome-LLM-Compression

Awesome LLM compression research papers and tools.

936 55 Updated Aug 2, 2024

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 7,798 851 Updated Aug 2, 2024

hao-ai-lab / LookaheadDecoding

Python 1,059 62 Updated Feb 14, 2024

DefTruth / Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,130 145 Updated Aug 5, 2024

microsoft / unilm

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Python 19,307 2,456 Updated Jul 15, 2024

openai / CLIP

CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image

Jupyter Notebook 24,020 3,162 Updated Jul 23, 2024

PKUFlyingPig / CMU10-714

Learning material for CMU10-714: Deep Learning System

Jupyter Notebook 182 29 Updated May 12, 2024

Glanvery / LLM-Travel

欢迎来到 "LLM-travel" 仓库！探索大语言模型（LLM）的奥秘 🚀。致力于深入理解、探讨以及实现与大模型相关的各种技术、原理和应用。

Jupyter Notebook 233 28 Updated Jul 21, 2024

triton-lang / triton

Development repository for the Triton language and compiler

C++ 12,160 1,458 Updated Aug 5, 2024

FasterDecoding / Medusa

Medusa: Simple Framework for Accelerating LLM Generation with Multiple Decoding Heads

Jupyter Notebook 2,074 135 Updated Jun 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wxd000000

Achievements

Achievements

Highlights

Block or report wxd000000

Stars

mlabonne / llm-course

hemingkx / SpeculativeDecodingPapers

kvcache-ai / Mooncake

yangjianxin1 / Firefly

DefTruth / CUDA-Learn-Notes

RVC-Boss / GPT-SoVITS

Oneflow-Inc / oneflow

Jack47 / hack-SysML

Jeevan-kumar-Raj / Grokking-System-Design

coqui-ai / TTS

flashinfer-ai / flashinfer

S-LoRA / S-LoRA

NVIDIA / TransformerEngine

sgl-project / sglang

microsoft / DeepSpeed-MII

NVIDIA / FasterTransformer

AIoT-MLSys-Lab / Efficient-LLMs-Survey

Dao-AILab / flash-attention

nomic-ai / gpt4all

SafeAILab / EAGLE

HuangOwen / Awesome-LLM-Compression

NVIDIA / TensorRT-LLM

hao-ai-lab / LookaheadDecoding

DefTruth / Awesome-LLM-Inference

microsoft / unilm

openai / CLIP

PKUFlyingPig / CMU10-714

Glanvery / LLM-Travel

triton-lang / triton

FasterDecoding / Medusa