Skip to content
View qqtang0's full-sized avatar
Block or Report

Block or report qqtang0

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Stars

Showing results

A framework for few-shot evaluation of language models.

Python 5,876 1,567 Updated Jul 18, 2024

Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf

Python 911 158 Updated Jul 16, 2024

A quantization algorithm for LLM

Cuda 86 5 Updated Jun 21, 2024

Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"

Python 323 30 Updated Feb 24, 2024

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Python 1,457 167 Updated Jul 15, 2024

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Python 15,079 1,445 Updated Jul 18, 2024

The official Meta Llama 3 GitHub site

Python 23,314 2,499 Updated Jul 17, 2024

Official PyTorch implementation of QA-LoRA

Python 102 9 Updated Mar 13, 2024

LLM inference in C/C++

C++ 61,817 8,861 Updated Jul 18, 2024

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

Python 1,458 139 Updated Jun 27, 2024

Code for QuaRot, an end-to-end 4-bit inference of large language models.

Python 211 16 Updated Jun 1, 2024

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Cuda 222 15 Updated Jul 2, 2024