qqtang0

qqtang0

0 followers · 4 following

Block or Report

Block or report qqtang0

Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Stars

13 results for source starred repositories

Clear filter

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python 5,876 1,567 Updated Jul 18, 2024

Vahe1994 / AQLM

Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf

Python 911 158 Updated Jul 16, 2024

bytedance / decoupleQ

A quantization algorithm for LLM

Cuda 86 5 Updated Jun 21, 2024

Cornell-RelaxML / quip-sharp

Python 450 37 Updated Jul 8, 2024

Cornell-RelaxML / QuIP

Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"

Python 323 30 Updated Feb 24, 2024

casper-hansen / AutoAWQ

AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:

Python 1,457 167 Updated Jul 15, 2024

huggingface / peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Python 15,079 1,445 Updated Jul 18, 2024

meta-llama / llama3

The official Meta Llama 3 GitHub site

Python 23,314 2,499 Updated Jul 17, 2024

yuhuixu1993 / qa-lora

Official PyTorch implementation of QA-LoRA

Python 102 9 Updated Mar 13, 2024

ggerganov / llama.cpp

LLM inference in C/C++

C++ 61,816 8,861 Updated Jul 18, 2024

kyegomez / BitNet

Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch

Python 1,457 139 Updated Jun 27, 2024

spcl / QuaRot

Code for QuaRot, an end-to-end 4-bit inference of large language models.

Python 211 16 Updated Jun 1, 2024

efeslab / Atom

[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving

Cuda 222 15 Updated Jul 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly