Skip to content

Commit

Permalink
Add quantize script for batch quantization (#92)
Browse files Browse the repository at this point in the history
* Add quantize script for batch quantization

* Indentation

* README for new quantize.sh

* Fix script name

* Fix file list on Mac OS

---------

Co-authored-by: Georgi Gerganov <[email protected]>
  • Loading branch information
prusnak and ggerganov committed Mar 13, 2023
1 parent 1808ee0 commit d1f2247
Show file tree
Hide file tree
Showing 2 changed files with 18 additions and 31 deletions.
34 changes: 3 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -145,44 +145,16 @@ python3 -m pip install torch numpy sentencepiece
python3 convert-pth-to-ggml.py models/7B/ 1

# quantize the model to 4-bits
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin 2
./quantize.sh 7B

# run the inference
./main -m ./models/7B/ggml-model-q4_0.bin -t 8 -n 128
```

For the bigger models, there are a few extra quantization steps. For example, for LLaMA-13B, converting to FP16 format
will create 2 ggml files, instead of one:

```bash
ggml-model-f16.bin
ggml-model-f16.bin.1
```

You need to quantize each of them separately like this:

```bash
./quantize ./models/13B/ggml-model-f16.bin ./models/13B/ggml-model-q4_0.bin 2
./quantize ./models/13B/ggml-model-f16.bin.1 ./models/13B/ggml-model-q4_0.bin.1 2
```

Everything else is the same. Simply run:

```bash
./main -m ./models/13B/ggml-model-q4_0.bin -t 8 -n 128
```

The number of files generated for each model is as follows:

```
7B -> 1 file
13B -> 2 files
30B -> 4 files
65B -> 8 files
```

When running the larger models, make sure you have enough disk space to store all the intermediate files.

TODO: add model disk/mem requirements

### Interactive mode

If you want a more ChatGPT-like experience, you can run in interactive mode by passing `-i` as a parameter.
Expand Down
15 changes: 15 additions & 0 deletions quantize.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
#!/usr/bin/env bash

if ! [[ "$1" =~ ^[0-9]{1,2}B$ ]]; then
echo
echo "Usage: quantize.sh 7B|13B|30B|65B [--remove-f16]"
echo
exit 1
fi

for i in `ls models/$1/ggml-model-f16.bin*`; do
./quantize "$i" "${i/f16/q4_0}" 2
if [[ "$2" == "--remove-f16" ]]; then
rm "$i"
fi
done

0 comments on commit d1f2247

Please sign in to comment.