-
HKUST(Guangzhou)
-
07:15
(UTC +08:00) - https://lzzmm.github.io
Highlights
- Pro
Block or Report
Block or report lzzmm
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseLists (3)
Sort Name ascending (A-Z)
Stars
Language
Sort by: Recently starred
TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…
MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
A lightweight library for portable low-level GPU computation using WebGPU.
To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
A low-latency & high-throughput serving engine for LLMs
An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).
This is an online course where you can learn and master the skill of low-level performance analysis and tuning.
A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems
An awesome repository of local AI tools
Drop-in, local AI alternative to the OpenAI stack. Multi-engine (llama.cpp, TensorRT-LLM, ONNX). Powers 👋 Jan
A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
C++ Insights - See your source code with the eyes of a compiler
A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture
llama3 implementation one matrix multiplication at a time
A large-scale simulation framework for LLM inference
Bayesian optimisation & Reinforcement Learning library developped by Huawei Noah's Ark Lab
A high-throughput and memory-efficient inference and serving engine for LLMs
Development repository for the Triton language and compiler
A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch
Open deep learning compiler stack for cpu, gpu and specialized accelerators
Universal LLM Deployment Engine with ML Compilation
Official code for paper: Desigen: A Pipeline for Controllable Design Template Generation [CVPR'24]
[ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.
microsoft / Megatron-DeepSpeed
Forked from NVIDIA/Megatron-LMOngoing research training transformer language models at scale, including: BERT & GPT-2