chenbohua3

chenbohua3

17 followers · 11 following

@AlibabaPAI

Achievements

Organizations

Stars

microsoft / MInference

To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

Python 660 22 Updated Aug 13, 2024

NVIDIA / cutlass

CUDA Templates for Linear Algebra Subroutines

C++ 5,216 877 Updated Aug 20, 2024

kvcache-ai / Mooncake

Mooncake is the serving platform for Kimi, a leading LLM service provided by Moonshot AI.

996 21 Updated Jul 31, 2024

NVIDIA / TensorRT-Model-Optimizer

TensorRT Model Optimizer is a unified library of state-of-the-art model optimization techniques such as quantization and sparsity. It compresses deep learning models for downstream deployment frame…

Python 393 23 Updated Aug 5, 2024

Mooler0410 / LLMsPracticalGuide

A curated list of practical guide resources of LLMs (LLMs Tree, Examples, Papers)

9,189 697 Updated May 31, 2024

mit-han-lab / qserve

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving

Python 378 16 Updated Aug 13, 2024

33357 / smartcontract-apps

这是一个面向中文社区，分析市面上智能合约应用的架构与实现的仓库。

Solidity 1,479 327 Updated Aug 3, 2024

linexjlin / GPTs

leaked prompts of GPTs

28,097 3,764 Updated Jul 9, 2024

hao-ai-lab / LookaheadDecoding

[ICML 2024] Break the Sequential Dependency of LLM Inference Using Lookahead Decoding

Python 1,074 63 Updated Feb 14, 2024

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 7,983 871 Updated Aug 20, 2024

THUDM / CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型

Python 5,797 397 Updated May 29, 2024

mit-han-lab / streaming-llm

[ICLR 2024] Efficient Streaming Language Models with Attention Sinks

Python 6,474 360 Updated Jul 11, 2024

mit-han-lab / llm-awq

[MLSys 2024 Best Paper Award] AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 2,256 169 Updated Jul 16, 2024

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 25,296 3,656 Updated Aug 23, 2024

intel / xFasterTransformer

C++ 335 58 Updated Aug 21, 2024

horseee / Awesome-Efficient-LLM

A curated list for Efficient Large Language Models

Python 1,058 75 Updated Aug 22, 2024

triton-lang / triton

Development repository for the Triton language and compiler

C++ 12,308 1,487 Updated Aug 23, 2024

Xwin-LM / Xwin-LM

Xwin-LM: Powerful, Stable, and Reproducible LLM Alignment

Python 1,012 41 Updated May 31, 2024

openai / human-eval

Code for the paper "Evaluating Large Language Models Trained on Code"

Python 2,260 325 Updated Feb 5, 2024

leptonai / leptonai

A Pythonic framework to simplify AI service building

Python 2,615 167 Updated Aug 21, 2024

tairov / llama2.mojo

Inference Llama 2 in one file of pure 🔥

Mojo 2,088 138 Updated May 21, 2024

leptonai / examples

Lepton Examples

Jupyter Notebook 140 18 Updated Jul 25, 2024

Lightning-AI / lit-llama

Implementation of the LLaMA language model based on nanoGPT. Supports flash attention, Int8 and GPTQ 4bit quantization, LoRA and LLaMA-Adapter fine-tuning, pre-training. Apache 2.0-licensed.

Python 5,926 510 Updated Aug 22, 2024

openai / evals

Evals is a framework for evaluating LLMs and LLM systems, and an open-source registry of benchmarks.

Python 14,563 2,560 Updated Aug 20, 2024

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python 6,210 1,645 Updated Aug 22, 2024

google / BIG-bench

Beyond the Imitation Game collaborative benchmark for measuring and extrapolating the capabilities of language models

Python 2,794 583 Updated Jul 19, 2024

intel / intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Python 2,095 205 Updated Aug 23, 2024

facebookincubator / AITemplate

AITemplate is a Python framework which renders neural network into high performance CUDA/HIP C++ code. Specialized for FP16 TensorCore (NVIDIA GPU) and MatrixCore (AMD GPU) inference.

Python 4,512 359 Updated Aug 14, 2024

pytorch / torchdynamo

A Python-level JIT compiler designed to make unmodified PyTorch programs faster.

Python 990 123 Updated Apr 17, 2024

alibaba / BladeDISC

BladeDISC is an end-to-end DynamIc Shape Compiler project for machine learning workloads.

C++ 786 159 Updated Aug 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chenbohua3

Achievements

Achievements

Organizations

Block or report chenbohua3

Stars

microsoft / MInference

NVIDIA / cutlass

kvcache-ai / Mooncake

NVIDIA / TensorRT-Model-Optimizer

Mooler0410 / LLMsPracticalGuide

mit-han-lab / qserve

33357 / smartcontract-apps

linexjlin / GPTs

hao-ai-lab / LookaheadDecoding

NVIDIA / TensorRT-LLM

THUDM / CogVLM

mit-han-lab / streaming-llm

mit-han-lab / llm-awq

vllm-project / vllm

intel / xFasterTransformer

horseee / Awesome-Efficient-LLM

triton-lang / triton

Xwin-LM / Xwin-LM

openai / human-eval

leptonai / leptonai

tairov / llama2.mojo

leptonai / examples

Lightning-AI / lit-llama

openai / evals

EleutherAI / lm-evaluation-harness

google / BIG-bench

intel / intel-extension-for-transformers

facebookincubator / AITemplate

pytorch / torchdynamo

alibaba / BladeDISC