Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move convert.py to examples/convert-legacy-llama.py #7430

Merged
merged 13 commits into from
May 30, 2024
Prev Previous commit
Next Next commit
Fix CI, scripts, readme files
  • Loading branch information
Galunid committed May 21, 2024
commit 85e4e2b77790bf304318422e837f9a4d5cb61a6a
2 changes: 1 addition & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1305,7 +1305,7 @@ set_target_properties(llama PROPERTIES PUBLIC_HEADER ${CMAKE_CURRENT_SOURCE_DIR}
install(TARGETS llama LIBRARY PUBLIC_HEADER)

install(
FILES convert.py
FILES convert-hf-to-gguf.py
PERMISSIONS
OWNER_READ
OWNER_WRITE
Expand Down
7 changes: 4 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -690,7 +690,8 @@ Building the program with BLAS support may lead to some performance improvements

To obtain the official LLaMA 2 weights please see the <a href="#obtaining-and-using-the-facebook-llama-2-model">Obtaining and using the Facebook LLaMA 2 model</a> section. There is also a large selection of pre-quantized `gguf` models available on Hugging Face.

Note: `convert.py` does not support LLaMA 3, you can use `convert-hf-to-gguf.py` with LLaMA 3 downloaded from Hugging Face.
Note: `convert.py` has been moved to `examples/convert-no-torch.py` and shouldn't be used for anything other than `Llama/Llama2/Mistral` models and their derievatives.
It does not support LLaMA 3, you can use `convert-hf-to-gguf.py` with LLaMA 3 downloaded from Hugging Face.

```bash
# obtain the official LLaMA model weights and place them in ./models
Expand All @@ -707,10 +708,10 @@ ls ./models
python3 -m pip install -r requirements.txt

# convert the model to ggml FP16 format
python3 convert.py models/mymodel/
python3 convert-hf-to-gguf.py models/mymodel/

# [Optional] for models using BPE tokenizers
python convert.py models/mymodel/ --vocab-type bpe
python convert-hf-to-gguf.py models/mymodel/ --vocab-type bpe

# quantize the model to 4-bits (using Q4_K_M method)
./quantize ./models/mymodel/ggml-model-f16.gguf ./models/mymodel/ggml-model-Q4_K_M.gguf Q4_K_M
Expand Down
4 changes: 2 additions & 2 deletions ci/run.sh
Original file line number Diff line number Diff line change
Expand Up @@ -282,7 +282,7 @@ function gg_run_open_llama_3b_v2 {
(time cmake -DCMAKE_BUILD_TYPE=Release ${CMAKE_EXTRA} -DLLAMA_QKK_64=1 .. ) 2>&1 | tee -a $OUT/${ci}-cmake.log
(time make -j ) 2>&1 | tee -a $OUT/${ci}-make.log

python3 ../convert.py ${path_models}
python3 ../examples/convert-no-torch.py ${path_models}

model_f16="${path_models}/ggml-model-f16.gguf"
model_q8_0="${path_models}/ggml-model-q8_0.gguf"
Expand Down Expand Up @@ -417,7 +417,7 @@ function gg_run_open_llama_7b_v2 {
(time cmake -DCMAKE_BUILD_TYPE=Release ${CMAKE_EXTRA} -DLLAMA_CUDA=1 .. ) 2>&1 | tee -a $OUT/${ci}-cmake.log
(time make -j ) 2>&1 | tee -a $OUT/${ci}-make.log

python3 ../convert.py ${path_models}
python3 ../examples/convert-no-torch.py ${path_models}

model_f16="${path_models}/ggml-model-f16.gguf"
model_q8_0="${path_models}/ggml-model-q8_0.gguf"
Expand Down
2 changes: 1 addition & 1 deletion docs/HOWTO-add-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ Also, it is important to check that the examples and main ggml backends (CUDA, M
### 1. Convert the model to GGUF

This step is done in python with a `convert` script using the [gguf](https://pypi.org/project/gguf/) library.
Depending on the model architecture, you can use either [convert.py](../convert.py) or [convert-hf-to-gguf.py](../convert-hf-to-gguf.py).
Depending on the model architecture, you can use either [convert-hf-to-gguf.py](../convert-hf-to-gguf.py) or [examples/convert-no-torch.py](../examples/convert-no-torch.py) (for `llama/llama2` models in `.pth` format).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that .pth stands for pytorch, I think this name could be misleading. Why not call it convert-llama.py?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

convert-legacy-llama.py maybe? I'd like to avoid giving impression that it can be used for converting llama 3


The convert script reads the model configuration, tokenizer, tensor names+data and converts them to GGUF metadata and tensors.

Expand Down
4 changes: 2 additions & 2 deletions examples/llava/MobileVLM-README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,10 @@ python ./examples/llava/convert-image-encoder-to-gguf \
--projector-type ldpv2
```

4. Use `convert.py` to convert the LLaMA part of LLaVA to GGUF:
4. Use `examples/convert-no-torch.py` to convert the LLaMA part of LLaVA to GGUF:

```sh
python ./convert.py path/to/MobileVLM-1.7B
python ./examples/convert-no-torch.py path/to/MobileVLM-1.7B
```

5. Use `quantize` to convert LLaMA part's DataType from `fp16` to `q4_k`
Expand Down
6 changes: 3 additions & 3 deletions examples/llava/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,10 @@ python ./examples/llava/llava-surgery.py -m ../llava-v1.5-7b
python ./examples/llava/convert-image-encoder-to-gguf.py -m ../clip-vit-large-patch14-336 --llava-projector ../llava-v1.5-7b/llava.projector --output-dir ../llava-v1.5-7b
```

5. Use `convert.py` to convert the LLaMA part of LLaVA to GGUF:
5. Use `examples/convert-no-torch.py` to convert the LLaMA part of LLaVA to GGUF:

```sh
python ./convert.py ../llava-v1.5-7b --skip-unknown
python ./examples/convert-no-torch.py ../llava-v1.5-7b --skip-unknown
```

Now both the LLaMA part and the image encoder are in the `llava-v1.5-7b` directory.
Expand Down Expand Up @@ -92,7 +92,7 @@ python ./examples/llava/convert-image-encoder-to-gguf.py -m vit --llava-projecto

6) Then convert the model to gguf format:
```console
python ./convert.py ../llava-v1.6-vicuna-7b/ --skip-unknown
python ./examples/convert-no-torch.py ../llava-v1.6-vicuna-7b/ --skip-unknown
```

7) And finally we can run the llava-cli using the 1.6 model version:
Expand Down
98 changes: 0 additions & 98 deletions examples/make-ggml.py

This file was deleted.

20 changes: 10 additions & 10 deletions scripts/convert-gg.sh
Original file line number Diff line number Diff line change
Expand Up @@ -3,20 +3,20 @@
set -e

# LLaMA v1
python3 convert.py ../llama1/7B --outfile models/llama-7b/ggml-model-f16.gguf --outtype f16
python3 convert.py ../llama1/13B --outfile models/llama-13b/ggml-model-f16.gguf --outtype f16
python3 convert.py ../llama1/30B --outfile models/llama-30b/ggml-model-f16.gguf --outtype f16
python3 convert.py ../llama1/65B --outfile models/llama-65b/ggml-model-f16.gguf --outtype f16
python3 examples/convert-no-torch.py ../llama1/7B --outfile models/llama-7b/ggml-model-f16.gguf --outtype f16
python3 examples/convert-no-torch.py ../llama1/13B --outfile models/llama-13b/ggml-model-f16.gguf --outtype f16
python3 examples/convert-no-torch.py ../llama1/30B --outfile models/llama-30b/ggml-model-f16.gguf --outtype f16
python3 examples/convert-no-torch.py ../llama1/65B --outfile models/llama-65b/ggml-model-f16.gguf --outtype f16

# LLaMA v2
python3 convert.py ../llama2/llama-2-7b --outfile models/llama-7b-v2/ggml-model-f16.gguf --outtype f16
python3 convert.py ../llama2/llama-2-13b --outfile models/llama-13b-v2/ggml-model-f16.gguf --outtype f16
python3 convert.py ../llama2/llama-2-70b --outfile models/llama-70b-v2/ggml-model-f16.gguf --outtype f16
python3 examples/convert-no-torch.py ../llama2/llama-2-7b --outfile models/llama-7b-v2/ggml-model-f16.gguf --outtype f16
python3 examples/convert-no-torch.py ../llama2/llama-2-13b --outfile models/llama-13b-v2/ggml-model-f16.gguf --outtype f16
python3 examples/convert-no-torch.py ../llama2/llama-2-70b --outfile models/llama-70b-v2/ggml-model-f16.gguf --outtype f16

# Code Llama
python3 convert.py ../codellama/CodeLlama-7b/ --outfile models/codellama-7b/ggml-model-f16.gguf --outtype f16
python3 convert.py ../codellama/CodeLlama-13b/ --outfile models/codellama-13b/ggml-model-f16.gguf --outtype f16
python3 convert.py ../codellama/CodeLlama-34b/ --outfile models/codellama-34b/ggml-model-f16.gguf --outtype f16
python3 examples/convert-no-torch.py ../codellama/CodeLlama-7b/ --outfile models/codellama-7b/ggml-model-f16.gguf --outtype f16
python3 examples/convert-no-torch.py ../codellama/CodeLlama-13b/ --outfile models/codellama-13b/ggml-model-f16.gguf --outtype f16
python3 examples/convert-no-torch.py ../codellama/CodeLlama-34b/ --outfile models/codellama-34b/ggml-model-f16.gguf --outtype f16

# Falcon
python3 convert-falcon-hf-to-gguf.py ../falcon/falcon-7b 1
Expand Down
14 changes: 7 additions & 7 deletions scripts/pod-llama.sh
Original file line number Diff line number Diff line change
Expand Up @@ -75,7 +75,7 @@ if [ "$1" -eq "1" ]; then

cd /workspace/llama.cpp

python3 convert.py ./models/tinyllama-1b --outfile ./models/tinyllama-1b/ggml-model-f16.gguf --outtype f16
python3 examples/convert-no-torch.py ./models/tinyllama-1b --outfile ./models/tinyllama-1b/ggml-model-f16.gguf --outtype f16

./quantize ./models/tinyllama-1b/ggml-model-f16.gguf ./models/tinyllama-1b/ggml-model-q4_0.gguf q4_0
./quantize ./models/tinyllama-1b/ggml-model-f16.gguf ./models/tinyllama-1b/ggml-model-q4_k.gguf q4_k
Expand All @@ -90,7 +90,7 @@ if [ "$1" -eq "2" ]; then

cd /workspace/llama.cpp

python3 convert.py ./models/codellama-7b --outfile ./models/codellama-7b/ggml-model-f16.gguf --outtype f16
python3 examples/convert-no-torch.py ./models/codellama-7b --outfile ./models/codellama-7b/ggml-model-f16.gguf --outtype f16

./quantize ./models/codellama-7b/ggml-model-f16.gguf ./models/codellama-7b/ggml-model-q4_0.gguf q4_0
./quantize ./models/codellama-7b/ggml-model-f16.gguf ./models/codellama-7b/ggml-model-q4_k.gguf q4_k
Expand All @@ -105,7 +105,7 @@ if [ "$1" -eq "3" ]; then

cd /workspace/llama.cpp

python3 convert.py ./models/codellama-13b --outfile ./models/codellama-13b/ggml-model-f16.gguf --outtype f16
python3 examples/convert-no-torch.py ./models/codellama-13b --outfile ./models/codellama-13b/ggml-model-f16.gguf --outtype f16

./quantize ./models/codellama-13b/ggml-model-f16.gguf ./models/codellama-13b/ggml-model-q4_0.gguf q4_0
./quantize ./models/codellama-13b/ggml-model-f16.gguf ./models/codellama-13b/ggml-model-q4_k.gguf q4_k
Expand All @@ -120,7 +120,7 @@ if [ "$1" -eq "4" ]; then

cd /workspace/llama.cpp

python3 convert.py ./models/codellama-34b --outfile ./models/codellama-34b/ggml-model-f16.gguf --outtype f16
python3 examples/convert-no-torch.py ./models/codellama-34b --outfile ./models/codellama-34b/ggml-model-f16.gguf --outtype f16

./quantize ./models/codellama-34b/ggml-model-f16.gguf ./models/codellama-34b/ggml-model-q4_0.gguf q4_0
./quantize ./models/codellama-34b/ggml-model-f16.gguf ./models/codellama-34b/ggml-model-q4_k.gguf q4_k
Expand All @@ -135,7 +135,7 @@ if [ "$1" -eq "5" ]; then

cd /workspace/llama.cpp

python3 convert.py ./models/codellama-7b-instruct --outfile ./models/codellama-7b-instruct/ggml-model-f16.gguf --outtype f16
python3 examples/convert-no-torch.py ./models/codellama-7b-instruct --outfile ./models/codellama-7b-instruct/ggml-model-f16.gguf --outtype f16

./quantize ./models/codellama-7b-instruct/ggml-model-f16.gguf ./models/codellama-7b-instruct/ggml-model-q4_0.gguf q4_0
./quantize ./models/codellama-7b-instruct/ggml-model-f16.gguf ./models/codellama-7b-instruct/ggml-model-q4_k.gguf q4_k
Expand All @@ -150,7 +150,7 @@ if [ "$1" -eq "6" ]; then

cd /workspace/llama.cpp

python3 convert.py ./models/codellama-13b-instruct --outfile ./models/codellama-13b-instruct/ggml-model-f16.gguf --outtype f16
python3 examples/convert-no-torch.py ./models/codellama-13b-instruct --outfile ./models/codellama-13b-instruct/ggml-model-f16.gguf --outtype f16

./quantize ./models/codellama-13b-instruct/ggml-model-f16.gguf ./models/codellama-13b-instruct/ggml-model-q4_0.gguf q4_0
./quantize ./models/codellama-13b-instruct/ggml-model-f16.gguf ./models/codellama-13b-instruct/ggml-model-q4_k.gguf q4_k
Expand All @@ -165,7 +165,7 @@ if [ "$1" -eq "7" ]; then

cd /workspace/llama.cpp

python3 convert.py ./models/codellama-34b-instruct --outfile ./models/codellama-34b-instruct/ggml-model-f16.gguf --outtype f16
python3 examples/convert-no-torch.py ./models/codellama-34b-instruct --outfile ./models/codellama-34b-instruct/ggml-model-f16.gguf --outtype f16

./quantize ./models/codellama-34b-instruct/ggml-model-f16.gguf ./models/codellama-34b-instruct/ggml-model-q4_0.gguf q4_0
./quantize ./models/codellama-34b-instruct/ggml-model-f16.gguf ./models/codellama-34b-instruct/ggml-model-q4_k.gguf q4_k
Expand Down
Loading