Weili17

Weili17

5 followers · 18 following

Block or Report

Block or report Weili17

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Python 1,510 171 Updated Jul 31, 2024

AutoGPTQ / AutoGPTQ

An easy-to-use LLMs quantization package with user-friendly apis, based on GPTQ algorithm.

Python 4,177 435 Updated Jul 26, 2024

mit-han-lab / smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Python 1,126 130 Updated Jul 12, 2024

second-state / meetups

69 8 Updated Jul 12, 2023

sgl-project / sglang

SGLang is yet another fast serving framework for large language models and vision language models.

Python 3,578 220 Updated Jul 31, 2024

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilizatio…

Python 1,690 273 Updated Jul 31, 2024

abetlen / llama-cpp-python

Python bindings for llama.cpp

Python 7,332 879 Updated Jul 31, 2024

sunkx109 / llama.cpp

llama 2 Inference

C 32 5 Updated Nov 4, 2023

ollama / ollama

Get up and running with Llama 3.1, Mistral, Gemma 2, and other large language models.

Go 82,405 6,294 Updated Jul 31, 2024

mindspore-courses / step_into_llm

MindSpore online courses: Step into LLM

Jupyter Notebook 388 82 Updated Jun 14, 2024

MAhaitao999 / CUDA_Programming

《CUDA编程基础与实践》一书的代码

Cuda 72 18 Updated Apr 28, 2022

brucefan1983 / CUDA-Programming

Sample codes for my CUDA programming book

Cuda 1,476 311 Updated Jul 27, 2023

zjhellofss / KuiperLLama

动手实现大模型推理框架

C++ 96 16 Updated Jul 30, 2024

NVIDIA / cuda-samples

Samples for CUDA Developers which demonstrates features in CUDA Toolkit

C 5,863 1,717 Updated Jul 26, 2024

bytedance / lightseq

LightSeq: A High Performance Library for Sequence Processing and Generation

C++ 3,142 324 Updated May 16, 2023

PacktPublishing / Learn-CUDA-Programming

Learn CUDA Programming, published by Packt

Cuda 959 224 Updated Dec 30, 2023

cuda-mode / lectures

Material for cuda-mode lectures

Jupyter Notebook 2,001 196 Updated Jun 13, 2024

akaihaoshuai / baby-llama2-chinese_cybertron

Forked from DLLXW/baby-llama2-chinese

使用单个24G显卡，从0开始训练LLM

Python 43 6 Updated Jun 27, 2024

ggerganov / llama.cpp

LLM inference in C/C++

C++ 62,778 9,008 Updated Jul 31, 2024

microsoft / DeepSpeed-MII

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Python 1,792 168 Updated Jul 30, 2024

xlang-ai / batch-prompting

[EMNLP 2023 Industry Track] A simple prompting approach that enables the LLMs to run inference in batches.

Python 63 5 Updated Mar 8, 2024

hemingkx / SpeculativeDecodingPapers

📰 Must-read papers and blogs on Speculative Decoding ⚡️

285 12 Updated Jul 30, 2024

feifeibear / long-context-attention

Sequence Parallel Attention for Long Context LLM Model Training and Inference

Python 241 9 Updated Jun 27, 2024

yuchenlin / SwiftSage

SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks

Python 231 21 Updated Aug 7, 2023

Tencent / TurboTransformers

a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU.

C++ 1,458 194 Updated Jun 12, 2023

xdit-project / xDiT

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) on multi-GPU Clusters

Python 135 14 Updated Jul 30, 2024

MaaAssistantArknights / MaaAssistantArknights

《明日方舟》小助手，全日常一键长草！| A one-click tool for the daily tasks of Arknights, supporting all clients.

C++ 12,887 1,711 Updated Jul 31, 2024

LLMServe / DistServe

Disaggregated serving system for Large Language Models (LLMs).

Jupyter Notebook 211 17 Updated Jun 14, 2024

bentoml / OpenLLM

Run any open-source LLMs, such as Llama 3.1, Gemma, as OpenAI compatible API endpoint in the cloud.

Python 9,466 605 Updated Jul 30, 2024

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 22,392 2,484 Updated Jul 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly