Block or Report
Block or report qqtang0
Contact GitHub support about this user’s behavior. Learn more about reporting abuse.
Report abuseStars
Language
Sort by: Recently starred
A framework for few-shot evaluation of language models.
Official Pytorch repository for Extreme Compression of Large Language Models via Additive Quantization https://arxiv.org/pdf/2401.06118.pdf
Code for paper: "QuIP: 2-Bit Quantization of Large Language Models With Guarantees"
AutoAWQ implements the AWQ algorithm for 4-bit quantization with a 2x speedup during inference. Documentation:
🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.
Implementation of "BitNet: Scaling 1-bit Transformers for Large Language Models" in pytorch
Code for QuaRot, an end-to-end 4-bit inference of large language models.
[MLSys'24] Atom: Low-bit Quantization for Efficient and Accurate LLM Serving