Update README.md: Add Huggingface repo for 7B and 13B quantization (#142

) * Update README.md: Add Huggingface repo for 7B and 13B quantization * Update requirements.txt to pin PEFT and BNB version Reason - For BNB: tloen/alpaca-lora#350 For PEFT: huggingface/peft@c21afbe#diff-b3b90f453dea37bf90203fd395e9dedc21b21c9a38464c6b1572368c049ef8b2L116-L128
megvii-research · May 1, 2023 · 3348585 · 3348585
1 parent aca32f6
commit 3348585
Show file tree

Hide file tree

Showing 2 changed files with 4 additions and 3 deletions.
diff --git a/large_language_models/alpaca-qlora/requirements.txt b/large_language_models/alpaca-qlora/requirements.txt
@@ -3,6 +3,6 @@ loralib
 sentencepiece
 git+https://github.com/huggingface/transformers.git
 accelerate
-bitsandbytes
-git+https://github.com/huggingface/peft.git
-gradio
+bitsandbytes==0.37.2
+peft==0.2.0
+gradio
diff --git a/large_language_models/llama/quantization/README.md b/large_language_models/llama/quantization/README.md
@@ -1,4 +1,5 @@
 ### Update News
+- LLaMA-7B and 13B quantization are also available [here](https://huggingface.co/cnbeining/sparsebit-llama-quantization-7b-13b).
 - We have updated a llama-13b checkpoint with 3-bit 128-group quantization [here](https://drive.google.com/file/d/1LjZmOU8tr2VT6HdAP_WbuX8cqmrs5DrR). For config_cache and tokenizer_cache, the files can be found [here in huggingface](https://huggingface.co/decapoda-research/llama-13b-hf).
 - We implemented a cuda kernel for groupsize=128(int3/int4) & groupsize=64(int2). In our experiments, setting groupsize=128(int3) can make all quantization models achieve a significant increase in ppl compared to groupsize=-1. All results are updated in Table A.
 - We add `--single_device_mode` to support all quant models run in a single GPU(i.e. 2080ti). Please refer to the inference section for details.