lzzmm

Follow

🤯

CHEN Yuhan lzzmm

🤯

Follow

ML System, GPU Computing @HPMLL

32 followers · 41 following

HKUST(Guangzhou)
07:15 (UTC +08:00)
https://lzzmm.github.io

Achievements

Achievements

Highlights

Pro

Organizations

Block or Report

Block or report lzzmm

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Lists (3)

Sort

🔮 Future ideas

✨ Inspiration

🚀 My stack

Beta Lists are currently in beta. Share feedback and report bugs.

Stars

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 7,871 859 Updated Aug 8, 2024

OpenBMB / MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Python 9,728 673 Updated Aug 9, 2024

AnswerDotAI / gpu.cpp

A lightweight library for portable low-level GPU computation using WebGPU.

C++ 3,505 165 Updated Aug 9, 2024

daadaada / turingas

Assembler for NVIDIA Volta and Turing GPUs

Python 190 40 Updated Jan 13, 2022

microsoft / MInference

To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

Python 620 22 Updated Aug 8, 2024

microsoft / sarathi-serve

A low-latency & high-throughput serving engine for LLMs

Python 133 18 Updated Aug 5, 2024

usyd-fsalab / fp6_llm

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Cuda 164 14 Updated May 28, 2024

dendibakh / perf-ninja

This is an online course where you can learn and master the skill of low-level performance analysis and tuning.

C++ 2,393 202 Updated Aug 6, 2024

HPMLL / BurstGPT

A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems

Python 101 5 Updated Jul 9, 2024

homebrewltd / awesome-local-ai

An awesome repository of local AI tools

1,094 90 Updated Jun 21, 2024

janhq / cortex

Drop-in, local AI alternative to the OpenAI stack. Multi-engine (llama.cpp, TensorRT-LLM, ONNX). Powers 👋 Jan

C++ 1,836 96 Updated Aug 9, 2024

hiyouga / LLaMA-Factory

A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Python 28,728 3,522 Updated Aug 9, 2024

andreasfertig / cppinsights

C++ Insights - See your source code with the eyes of a compiler

C++ 3,995 235 Updated Jul 26, 2024

pprp / Pruner-Zero

Evolving Symbolic Pruning Metric from scratch

Python 60 5 Updated Jun 14, 2024

meta-llama / llama3

The official Meta Llama 3 GitHub site

Python 25,359 2,808 Updated Aug 8, 2024

KnowingNothing / compiler-and-arch

A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture

350 31 Updated Aug 8, 2024

naklecha / llama3-from-scratch

llama3 implementation one matrix multiplication at a time

Jupyter Notebook 11,761 912 Updated May 23, 2024

microsoft / vidur

A large-scale simulation framework for LLM inference

Python 173 18 Updated Aug 1, 2024

huawei-noah / HEBO

Bayesian optimisation & Reinforcement Learning library developped by Huawei Noah's Ark Lab

Jupyter Notebook 3,139 562 Updated Aug 3, 2024

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 24,513 3,532 Updated Aug 9, 2024

HPMLL / llm-benchmark

Python 2 Updated Feb 16, 2024

karpathy / llm.c

LLM training in simple, raw C/CUDA

Cuda 22,550 2,509 Updated Aug 9, 2024

triton-lang / triton

Development repository for the Triton language and compiler

C++ 12,197 1,467 Updated Aug 9, 2024

NVIDIA / apex

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Python 8,238 1,365 Updated Jul 25, 2024

apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Python 11,499 3,416 Updated Aug 9, 2024

mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation

Python 18,314 1,459 Updated Aug 9, 2024

HPMLL / DTC-SpMM_ASPLOS24

C++ 12 2 Updated Jun 19, 2024

whaohan / desigen

Official code for paper: Desigen: A Pipeline for Controllable Design Template Generation [CVPR'24]

Python 48 4 Updated Jul 18, 2024

DD-DuDa / BitDistiller

[ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.

Python 53 8 Updated May 16, 2024

microsoft / Megatron-DeepSpeed

Forked from NVIDIA/Megatron-LM

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Python 1,788 337 Updated Aug 9, 2024