Compressa.ai

llm-awq Public

Forked from mit-han-lab/llm-awq

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

Python 1

neural-compressor Public

Forked from intel/neural-compressor

Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, spar…

Python

qlora Public

Forked from artidoro/qlora

QLoRA: Efficient Finetuning of Quantized LLMs

Jupyter Notebook

peft Public

Forked from huggingface/peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Python

OmniQuant Public

Forked from OpenGVLab/OmniQuant

OmniQuant is a simple and powerful quantization technique for LLMs.

Python

smoothquant Public

Forked from mit-han-lab/smoothquant

[ICML 2023] SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

Python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compressa.ai

Popular repositories Loading

Repositories

People

Top languages

Most used topics