MXQ

Harware-friendly Mixed-precision 2-4 Quantization Method with QAT Fintune

使用说明：

一、finetune步骤

注：训练数据暂时未上传；2/4混合finetune的代码主要更新在/LLM-QAT/models/util_quant.py中，class MXAsymQuantizer.

二、量化步骤

python main.py --model /user/jhli/quantization/model/Llama-2-7b-hf --prune_method mxq

如果想保存model做后续评估

python main.py --model /user/jhli/quantization/model/Llama-2-7b-hf --prune_method mxq --save_model 路径

三、harness-eval步骤

cd mxq_quant/lm-evaluation-harness/
python setup.py install
cd ..
python lmeval.py --model hf-causal --model_args pretrained=保存的模型路径,dtype=float16,use_accelerate=True --tasks winogrande,piqa,hellaswag,arc_easy,wikitext

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
LLM-QAT		LLM-QAT
mxq_quant		mxq_quant
README.md		README.md

Provide feedback