qingchanghan

Follow

qingchanghan

Follow

7 followers · 6 following

BUAA

Achievements

Achievements

Block or Report

Block or report qingchanghan

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Stars

mit-han-lab / smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Python 1,126 130 Updated Jul 12, 2024

AniZpZ / smoothquant

Forked from mit-han-lab/smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Python 8 1 Updated Dec 13, 2023

alibaba / rtp-llm

RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.

C++ 468 41 Updated Jul 27, 2024

triton-lang / triton

Development repository for the Triton language and compiler

C++ 12,127 1,450 Updated Jul 31, 2024

ggerganov / ggml

Tensor library for machine learning

C++ 10,470 973 Updated Jul 31, 2024

SJTU-IPADS / PowerInfer

High-speed Large Language Model Serving on PCs with Consumer-grade GPUs

C++ 7,711 395 Updated Jul 15, 2024

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 7,736 842 Updated Jul 30, 2024

Liu-xiandong / How_to_optimize_in_GPU

This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, s…

Cuda 773 121 Updated Jul 29, 2023

feifeibear / LLMSpeculativeSampling

Fast inference from large lauguage models via speculative decoding

Python 441 47 Updated Jul 25, 2024

DefTruth / Awesome-LLM-Inference

📖A curated list of Awesome LLM Inference Paper with codes, TensorRT-LLM, vLLM, streaming-llm, AWQ, SmoothQuant, WINT8/4, Continuous Batching, FlashAttention, PagedAttention etc.

2,102 143 Updated Jul 31, 2024

AmadeusChan / Awesome-LLM-System-Papers

446 21 Updated Apr 11, 2024

megvii-research / Sparsebit

A model compression and acceleration toolbox based on pytorch.

Python 324 40 Updated Jan 12, 2024

BBuf / how-to-optim-algorithm-in-cuda

how to optimize some algorithm in cuda.

Cuda 1,284 106 Updated Jul 29, 2024

kwai / KwaiYii

211 3 Updated Aug 19, 2023

ModelTC / lightllm

LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.

Python 2,137 183 Updated Jul 31, 2024

Dao-AILab / flash-attention

Fast and memory-efficient exact attention

Python 12,674 1,132 Updated Jul 30, 2024

FMInference / FlexGen

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,096 531 Updated Jul 24, 2024

ggerganov / llama.cpp

LLM inference in C/C++

C++ 62,791 9,010 Updated Jul 31, 2024

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 23,938 3,442 Updated Jul 31, 2024

baichuan-inc / Baichuan-7B

A large-scale 7B pretraining language model developed by BaiChuan-Inc.

Python 5,664 504 Updated Jul 18, 2024

ModelTC / NART

NART = NART is not A RunTime, a deep learning inference framework.

Python 37 14 Updated Mar 2, 2023

openai / consistency_models

Official repo for consistency models.

Python 6,041 410 Updated Mar 22, 2024

THUDM / ChatGLM-6B

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

Python 40,151 5,164 Updated Jun 27, 2024

THUDM / GLM

GLM (General Language Model)

Python 3,136 319 Updated Nov 3, 2023

THUDM / GLM-130B

GLM-130B: An Open Bilingual Pre-Trained Model (ICLR 2023)

Python 7,651 608 Updated Jul 25, 2023

hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible

Python 38,421 4,314 Updated Jul 31, 2024

daquexian / onnx-simplifier

Simplify your onnx model

C++ 3,708 378 Updated Jul 8, 2024

bytedance / ByteTransformer

optimized BERT transformer inference on NVIDIA GPU. https://arxiv.org/abs/2210.03052

C++ 443 33 Updated Mar 15, 2024

FlagOpen / FlagPerf

FlagPerf is an open-source software platform for benchmarking AI chips.

Python 282 95 Updated Jul 31, 2024

Tencent / TPAT

TensorRT Plugin Autogen Tool

Python 365 42 Updated Apr 7, 2023