Skip to content
View lzzmm's full-sized avatar
🤯
🤯

Highlights

  • Pro

Organizations

@sysu
Block or Report

Block or report lzzmm

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Beta Lists are currently in beta. Share feedback and report bugs.
Showing results

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficie…

C++ 7,871 859 Updated Aug 8, 2024

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone

Python 9,728 673 Updated Aug 9, 2024

A lightweight library for portable low-level GPU computation using WebGPU.

C++ 3,505 165 Updated Aug 9, 2024

Assembler for NVIDIA Volta and Turing GPUs

Python 190 40 Updated Jan 13, 2022

To speed up Long-context LLMs' inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.

Python 620 22 Updated Aug 8, 2024

A low-latency & high-throughput serving engine for LLMs

Python 133 18 Updated Aug 5, 2024

An efficient GPU support for LLM inference with x-bit quantization (e.g. FP6,FP5).

Cuda 164 14 Updated May 28, 2024

This is an online course where you can learn and master the skill of low-level performance analysis and tuning.

C++ 2,393 202 Updated Aug 6, 2024

A ChatGPT(GPT-3.5) & GPT-4 Workload Trace to Optimize LLM Serving Systems

Python 101 5 Updated Jul 9, 2024

An awesome repository of local AI tools

1,094 90 Updated Jun 21, 2024

Drop-in, local AI alternative to the OpenAI stack. Multi-engine (llama.cpp, TensorRT-LLM, ONNX). Powers 👋 Jan

C++ 1,836 96 Updated Aug 9, 2024

A WebUI for Efficient Fine-Tuning of 100+ LLMs (ACL 2024)

Python 28,728 3,522 Updated Aug 9, 2024

C++ Insights - See your source code with the eyes of a compiler

C++ 3,995 235 Updated Jul 26, 2024

Evolving Symbolic Pruning Metric from scratch

Python 60 5 Updated Jun 14, 2024

The official Meta Llama 3 GitHub site

Python 25,359 2,808 Updated Aug 8, 2024

A list of tutorials, paper, talks, and open-source projects for emerging compiler and architecture

350 31 Updated Aug 8, 2024

llama3 implementation one matrix multiplication at a time

Jupyter Notebook 11,761 912 Updated May 23, 2024

A large-scale simulation framework for LLM inference

Python 173 18 Updated Aug 1, 2024

Bayesian optimisation & Reinforcement Learning library developped by Huawei Noah's Ark Lab

Jupyter Notebook 3,139 562 Updated Aug 3, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 24,513 3,532 Updated Aug 9, 2024
Python 2 Updated Feb 16, 2024

LLM training in simple, raw C/CUDA

Cuda 22,550 2,509 Updated Aug 9, 2024

Development repository for the Triton language and compiler

C++ 12,197 1,467 Updated Aug 9, 2024

A PyTorch Extension: Tools for easy mixed precision and distributed training in Pytorch

Python 8,238 1,365 Updated Jul 25, 2024

Open deep learning compiler stack for cpu, gpu and specialized accelerators

Python 11,499 3,416 Updated Aug 9, 2024

Universal LLM Deployment Engine with ML Compilation

Python 18,314 1,459 Updated Aug 9, 2024
C++ 12 2 Updated Jun 19, 2024

Official code for paper: Desigen: A Pipeline for Controllable Design Template Generation [CVPR'24]

Python 48 4 Updated Jul 18, 2024

[ACL 2024] A novel QAT with Self-Distillation framework to enhance ultra low-bit LLMs.

Python 53 8 Updated May 16, 2024

Ongoing research training transformer language models at scale, including: BERT & GPT-2

Python 1,788 337 Updated Aug 9, 2024
Next