qqtang0

qqtang0

0 followers · 4 following

Block or Report

Block or report qqtang0

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Stars

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python 5,876 1,567 Updated Jul 18, 2024

Vahe1994 / AQLM

Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf

Python 911 158 Updated Jul 16, 2024

bytedance / decoupleQ

A quantization algorithm for LLM

Cuda 86 5 Updated Jun 21, 2024

Cornell-RelaxML / quip-sharp

Python 450 37 Updated Jul 8, 2024

Cornell-RelaxML / QuIP

Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"

Python 323 30 Updated Feb 24, 2024

casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Python 1,457 167 Updated Jul 15, 2024

huggingface / peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Python 15,079 1,445 Updated Jul 18, 2024

meta-llama / llama3

The official Meta Llama 3 GitHub site

Python 23,314 2,499 Updated Jul 17, 2024

yuhuixu1993 / qa-lora

Official PyTorch implementation of QA-LoRA

Python 102 9 Updated Mar 13, 2024

ggerganov / llama.cpp

LLM inference in C/C++

C++ 61,817 8,861 Updated Jul 18, 2024

kyegomez / BitNet

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

Python 1,458 139 Updated Jun 27, 2024

spcl / QuaRot

Code for QuaRot, an end-to-end 4-bit inference of large language models.

Python 211 16 Updated Jun 1, 2024

efeslab / Atom

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Cuda 222 15 Updated Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly