Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert-*.py: GGUF Naming Convention Refactor and Metadata Override Refactor #7499

Merged
merged 66 commits into from
Jul 18, 2024

Conversation

mofosyne
Copy link
Collaborator

@mofosyne mofosyne commented May 23, 2024

In the interest of encouraging users to stick to GGUF Naming Convention I'm trying to move over the changes I made to convert.py to convert-hf-to-gguf.py for the default file name generation.

I would like to also port over the metadata override to the other converter script as well.


This is a work in progress but is made a draft PR in the interest of potential feedback.

E.g. I haven't got an idea yet on how to calculate the tensor parameter count for convert-llama-ggml-to-gguf.py . Also gotta figure out a smooth way for all of these script to share the same metadata override settings.

@mofosyne mofosyne self-assigned this May 23, 2024
@mofosyne mofosyne marked this pull request as draft May 23, 2024 17:51
@github-actions github-actions bot added the python python script changes label May 23, 2024
@mofosyne mofosyne added refactoring Refactoring Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level labels May 23, 2024
convert-hf-to-gguf.py Outdated Show resolved Hide resolved
convert-hf-to-gguf.py Outdated Show resolved Hide resolved
convert-hf-to-gguf.py Outdated Show resolved Hide resolved
convert.py Outdated Show resolved Hide resolved
expert_count_chunk = f"{expert_count_int}x" if expert_count_int is not None else ""
parameters = model_parameter_count_rounded_notation(model_params_count)
encodingScheme = encodingScheme.upper()
return f"{name}{version}-{expert_count_chunk}{parameters}-{encodingScheme}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this missing a .gguf? Unless it's intended to be added at call sites?

convert-hf-to-gguf.py Outdated Show resolved Hide resolved
@mofosyne mofosyne marked this pull request as ready for review May 24, 2024 08:29
@mofosyne
Copy link
Collaborator Author

mofosyne commented May 24, 2024

Setting to ready to review as it looks workable now.

Was hoping to also include convert-llama-ggml-to-gguf.py and ./examples/convert-llama2c-to-ggml/convert-llama2c-to-ggml.cpp but wasn't sure at this point how to best approach extracting the parameters count and other metadata that is required to generate the default filename.

But if I understand most people are mostly only using convert-hf-to-gguf.py or convert.py converter scripts anyway, so that should cover hopefully most use cases. The main objective of this PR is really just to bootstrap this naming convention by making the new gguf naming convention the default behavior.

@mofosyne
Copy link
Collaborator Author

rebased to get on top of the ci fixes in main branch for cleaner CI checks

convert-hf-to-gguf.py Outdated Show resolved Hide resolved
@mofosyne mofosyne added the help wanted Extra attention is needed label May 25, 2024
convert-hf-to-gguf.py Outdated Show resolved Hide resolved
@mofosyne
Copy link
Collaborator Author

mofosyne commented May 28, 2024

@compilade thanks for the review so far. I've had a bit more extra thought and think this is getting closer to the ideal.

Btw... I'm having a bit of a second thought about the position of version string... should it be TinyLLama-v0.1-5M-F16.gguf or TinyLLama-5M-F16-v0.1.gguf. Imagine these files being alphanumerically sorted in a directory, I am now thinking people would tend to want to group versions together?

Also what about -instruct in Mixtral-8x7B-F16-Instruct-v0.1.gguf (as mixtral tends to do)... do we call that a 'variant'? Is it basically a common model that had extra finetuning? Or is Mixtral-Instruct-8x7B-F16-v0.1.gguf where model name is Mixtral-Instruct (current approach) good enough (and is my preference)?

@mofosyne
Copy link
Collaborator Author

mofosyne commented May 30, 2024

#7430 merged, so will need to refactor

@mofosyne mofosyne marked this pull request as draft May 30, 2024 15:19
@mofosyne mofosyne marked this pull request as draft May 30, 2024 15:19
gguf-py/gguf/utility.py Outdated Show resolved Hide resolved
@mofosyne
Copy link
Collaborator Author

mofosyne commented May 30, 2024

@compilade rebased as convert.py is now a legacy script. Also after studying the naming scheme I think mistral had the right idea in their naming arrangement so this is the new naming arrangement I would like to propose.

<basename>-<expert_count>x<per_model_weight>-<encoding>-<finetune>-<version>-<shard>.gguf
Mixtral-8x7b-Q5_K_M-instruct-v0.1.gguf

If this PR passes I'll make the change to https://github.com/ggerganov/ggml/blob/master/docs/gguf.md

gguf-py/gguf/utility.py Outdated Show resolved Hide resolved
convert_hf_to_gguf.py Outdated Show resolved Hide resolved
@mofosyne
Copy link
Collaborator Author

@compilade
rebased to fix a conflict due to 97bdd26 adding model_instance.gguf_writer.add_type(gguf.GGUFType.MODEL) and the new addition of "general.type" to constant.py

What should happen when there are multiple output files (like when using split options)? Should --dry-run output the plan in json, or should it only print the "base" output file name to stdout? (this last one actually could be very simple to implement) It seems like logger.* prints to stderr, and I think shell expansion with "$(command)" only uses stdout to form the string. What do you think?

Seems a bit convoluted to output json. Anyway I'll just do it as a seperate PR as I personally classify it as a nice to have but slightly out of scope to kv store refactoring.

I think defaults are important, because most users won't use --metadata (partly because of docs, also laziness). And the convert script is used on all kinds of models, so I think it should ideally work mostly correctly in all cases. A regex is fine, as long as it fails gracefully.

That makes sense. Well it's partially also because I don't want to spend too much brainpower on it. Anyway I would prefer regex and I hope the new test cases I added will make it's graceful failure behavior more robust (Aka output no breakdown of sub components).

Again, --metadata isn't used by default, and size_label will be stored in the model. In this case, it's not that bad though, but it's a different convention than what the model authors used.

Yeah I see now that it's not super intuitive. Well at least it sounds like it's not a blocker here.

It's true that vocabs don't have model sizes. The default name of vocab files should likely only include the basename (and also finetune, because sometimes they change the vocab slightly), something like {basename}{finetune}{version}-vocab.gguf.

I actually agree with you on that, we could have a different form specifically for vocab out {basename}{finetune}{version}-vocab.gguf. Will see if I can add that in over the next few days, unless you can add it in as well.

Agreed with the feature freeze. I simply want to make sure this doesn't cause things we'll regret later on to be present in the metadata of GGUF models, like that 4k model size for Phi-3-mini-4k-Instruct.

I agree we want to avoid long term issue hence the refactor, but I'm also a bit annoyed that this PR is getting to the point that I'm constantly fixing conflicts with changes to main branch. So really would like to feature freeze and make sure we get just the most important stuff in (like stuff that you would regret if not adjusted as mentioned above) and merged in as soon as possible.

It might help with the convert_lora_to_gguf.py script if default values were added here

Co-authored-by: compilade <[email protected]>
@compilade
Copy link
Collaborator

compilade commented Jul 15, 2024

I'm also a bit annoyed that this PR is getting to the point that I'm constantly fixing conflicts with changes to main branch.

@mofosyne I agree this can be annoying. Something which I've found useful with my long-lived PRs is to use git merge master instead of git rebase -i master, so that I can fix the conflicts once in the latest version instead of having to fix conflicts for older commits of the PR. And when finally squash merging onto master, the merge commits are not included in the squashed commit message.

@mofosyne
Copy link
Collaborator Author

mofosyne commented Jul 16, 2024

@compilade thanks for the tip regarding using merge over rebase.

I have also addressed your other point about needing to add in the vocab only default filename.

Hopefully if it all looks good, don't forget to press the approve button, so we can get this merged in. Also happy for you to directly add your changes in if you spot anything you want to correct as well.

@mofosyne
Copy link
Collaborator Author

mofosyne commented Jul 17, 2024

@compilade btw while you have a look though, just wondering what you are still looking for? I think I've at least addressed all the major pain points now, but if there is anything I've overlooked do let me know. I've got all the points addressed as far as I know but this PR ticket is still locked because it's marked as 'compilade requested changes'.

@compilade
Copy link
Collaborator

compilade commented Jul 17, 2024

just wondering what you are still looking for?

@mofosyne

  • The Phi 4k problem is still a thing. Might be possible to parse the size label from the name and ignore it if it's too far from the real size. (Or keep it if unparsable, that way only simple cases like context size confusion can be handled)
  • I feel like a lot of the functions in gguf-py/gguf/utility.py belong in gguf-py/gguf/metadata.py, but I'm not sure.
  • Need to test how this interacts with LoRA adapters from convert_lora_to_gguf.py (because it uses Model from convert_hf_to_gguf.py)
  • I'd like to pick random model names from Hugging Face and see what can be improved in the regex. (My guees is Qwen1.5-something would fail because of the dot in the name, also Refact-1_6B-fim because of the underscore in the size)
  • Should the size label be normalized in some way?
    • consistent suffix capitalization (k vs K, m vs M, b vs B, etc), while the x in MoE size is kept lowercased
      • might be easily fixed with size_label.upper().replace("X", "x")
    • underscore to dot
  • Not a blocker, but an idea for later would be to do something similar to how {ftype} templates are done, but for the metadata too.

but this PR ticket is still locked because it's marked as 'compilade requested changes

Just so you know, if you believe I'm not responsive and my request for changes has been addressed, you can always "dismiss" a review (from the reviewer list summary at the bottom), but I think the reason given is not editable afterwards.

@mofosyne mofosyne dismissed compilade’s stale review July 17, 2024 21:10

Had a look again at all the review requested marked code, looked to all be resolved already

@mofosyne
Copy link
Collaborator Author

* The Phi `4k` problem is still a thing. Might be possible to parse the size label from the name and ignore it if it's too far from the real size. (Or keep it if unparsable, that way only simple cases like context size confusion can be handled)

I think in this case it's unavoidable as microsoft is definitely going against convention in the community. So wontfix in this PR at least.

* I feel like a lot of the functions in `gguf-py/gguf/utility.py` belong in `gguf-py/gguf/metadata.py`, but I'm not sure.

Easy to sort out in future PR if needed

* Need to test how this interacts with LoRA adapters from `convert_lora_to_gguf.py` (because it uses `Model` from `convert_hf_to_gguf.py`)

Hmmm... tempted to keep PR open if it's going to uncover an annoying issue later. But again probbly fixable in a later PR

* I'd like to pick random model names from Hugging Face and see what can be improved in the regex. (My guees is `Qwen1.5-something` would fail because of the dot in the name, also `Refact-1_6B-fim` because of the underscore in the size)

Yeah might be worth testing. Bit busy, but you can slide some changes would be appreciated.

* Should the size label be normalized in some way?

Undecided, but can fix later

* Not a blocker, but an idea for later would be to do something similar to how `{ftype}` templates are done, but for the metadata too.

True true, let's keep in mind for later PR.


Will merge in a day or two if no objection, but be good to see if we can handle Qwen1.5-something and Refact-1_6B-fim @compilade

@ThiloteE
Copy link

ThiloteE commented Jul 17, 2024

I adapted my test cases from https://regex101.com/r/NlzLoK/1 to more closely (but not perfectly) fit this PR.

Feel free to use and adapt as you see fit for unit tests and matching regex. To clarify: I have not tried the PR. Just posting this here for inspiration.


### Probably (?) should match.

ModelNameMoE-v0.3-8xA7b-Method-Q4_0
ModelNameMoE-v0.3-8x7b-Method-Q4_0
ModelNameMoE-v12.2-2x7b-Method-QQ4_k_m
ModelNameMoE-v1.2-2x7b-Method-qq4_k_m
ModelNameMoE-v1.2-2x7b-Method-Qq4_k_m
ModelNameMoE-v1.2-2x7b-Method-qQ4_k_m
ModelNameMoE-v1.2-2x7b-Method-IQ4_S
ModelNameMoE-v1.2-2x7b-Method-IQ4_XS
ModelNameMoE-v1.2-2x7b-Method-IQ4_XXS
ModelNameMoE-v1.2-2x7b-Method-Q4
ModelNameMoE-v1.2-2x7b-Method-
ModelNameMoE-v1.2-2x7b-Method
ModelNameMoE-v1.2-2x0.7b-Method
ModelNameMoE-v1.2-2x0.77b-Method
ModelNameMoE-v1.2-2x0.777b-Method
ModelNameMoE-v1.2-22x7b-Method
ModelNameMoE-v1.2-222x7b-Method
ModelNameMoE-v1.2-222x0.77b-Method
ModelNameMoE-v1.2-22x0.77b-Method
ModelNameMoE-v1.2-2x7b-Method
ModelName-v1.2-7b-Method
ModelName-v1.2-0.7b-Method
ModelName-v1.2-700M-Method
ModelName-v1.2-70M-Method
ModelName-v1.2-7M-Method
ModelName-v1.2-0.7M-Method
ModelName-v1.2-723b-Method
ModelName-v1.2-7000b-Method
ModelName-v1.2-7000b-Method
ModelName-v1.222222222-7000b-Method
ModelName-v1.0-7000b-Method
ModelName-v1-7000b-Method
ModelName-v1.2-Method-7b
MissingDashBetweenMethodAndQuantizationformat-v1.2-7b-MethodQ4_K_M
ModelNameMultipleMethods-v2.0-7b-Instruct-DPO-Q4_0.gguf
ModelNameMultipleMethods-v2.0-Instruct-DPO-7b-Q4_0.gguf
ModelNameMultipleMethods-v2.0-7b-4k-Instruct-Merged-DPO-Q4_0

Real Models:

Mixtral-8x7b-instruct-v0.1-Q5_K_M.gguf
Hermes-2-Theta-Llama-3-8B
SOLAR-10.7B-Instruct-v1.0



### Deviation from the regex pattern. Names that are discouraged. (should not match).

.StartModelNameWithNon_AlphaNumeric-v0.1-7b-Method-Q4_K_M
9StartModelNameWithNumber-v0.1-7b-Method-Q4_K_M
_StartModelNameWithUnderScore-v1-7b-Method-Q4_K_M
-StartModelNameWithDash-v0.2-2x7b-Method-Q4_K_M
Missingdashv0.2-7b-Method-Q4_K_M
Missingdash-v0.27b-Method-Q4_K_M
Missingdash-v0.2-7bMethod-Q4_K_M
Missingdash-v0.2-7bMethodQ4_K_M
MissingdashAfterMoE-v0.2-2x7bMethod-Q4_K_M
Missingdash-v1-7bMethod-Q4_K_M
MissingParameterCount-v0.2-Method
MissingMethod-v0.2-7b
MissingMethod-v0.2-7b-
MissingMethodAfterMoE-v0.2-8x7b
MissingMethodAfterMoE-v0.2-8x7b-
Missing_v_afterdashMoE-0.2-2x7b-Method-Q4_K_M
Missing_v_afterdash-0.2-7b-Method-QQ4_K_M
Modelnamecontainsdash_phi-3-v0.2-7b-Method-IQ4_XXS
ModelnamecontainsParaMeterCount_phi37b-v0.2-Method-IQ4_XXS
ModelnamecontainsParaMeterCount_phi37b-v0.2-7b-Method-IQ4_XXS
ModelnameQuantbeforeMethod-7b-Q5_K_M-Method-v0.1.gguf

@compilade compilade self-requested a review July 18, 2024 01:44
Using more than one regex to annotate the parts of the name,
this way, the order doesn't have to be fixed
and this should work correctly for more edge cases.

Also, the total parameter count of the model is used to figure out
if a size label is not actually a size label, but a context size.

* convert_lora : fix duplicate model type key
* gguf-py : output the split plan on stdout when using dry_run

* convert_hf : unify vocab naming convention with the standard one

This also adds a way to name LoRA models.
@compilade
Copy link
Collaborator

compilade commented Jul 18, 2024

@mofosyne I've fixed most of the problem with the regex by using more than one regex, but on smaller parts of the name at a time. It was otherwise hard to allow for variations in the order of fields.

I'll try to summarize what my recent commits did:

  • Instead of a single regex to separate the fields of a model name, split it with - and annotate each part by matching them individually to regexes of easily-identifiable sections, like the version and size label. Then identify where the basename ends and make the finetune field use the rest which was not annotated. (the rules are a bit more complex than that, but it's close enough for a brief explanation.)
    • This allows accepting any order of size label, finetune, version, and other stuff, to then use that to build a nicely ordered name following the naming scheme.
  • The Phi 4k problem is no longer a problem, because the total number of parameters of the converted model is now used to ignore what looks like a size label but which is too small.
    • When there's 4k in the name of a model with much more than 4k parameters, 4k is now part of the finetune field instead of the size_label, which should make sense.
  • Normalize the suffix of the size label.
    • Use capital letters for K, M, B, and T.
    • When something which looks like a size label is instead put into finetune, its suffix is set to lower case.
  • Fixed convert_lora_to_gguf.py which was broken because self.gguf_writer.add_type was called in prepare_metadata in convert_hf_to_gguf.py.
  • More test cases for the name parsing
    • Still not totally comprehensive, but it's better.
  • Unify the vocab naming convention function with the other one, by adding a model_type argument which can either be 'vocab', 'LoRA' or None.
    • Yes, this also makes LoRA models have the -LoRA suffix. I've put it after the encoding scheme to cleanly separate it from possible finetunes using "LoRA" in their name.
  • --dry-run makes the output files be printed on stdout, so that the output can be used in scripts

I'm starting to be mostly satisfied of the behavior of convert_hf_to_gguf.py and convert_lora_to_gguf.py. But there are still some things which are left to do:

  • Decide what should happen with examples/convert_legacy_llama.py.
  • In convert_lora_to_gguf.py, use the directory of the LoRA adapter for metadata purposes instead of the one of the base model
    • Might be complicated, because config.json is in the base model's directory and usually not in the adapter's directory.
  • (maybe) move the functions in gguf-py/gguf/utility.py to gguf-py/gguf/metadata.py.

@mofosyne
Copy link
Collaborator Author

$ python -m unittest discover ./gguf-py -v
test_apply_metadata_heuristic_from_hf_parameters (tests.test_metadata.TestMetadataMethod) ... ok
test_apply_metadata_heuristic_from_model_card (tests.test_metadata.TestMetadataMethod) ... ok
test_apply_metadata_heuristic_from_model_dir (tests.test_metadata.TestMetadataMethod) ... ok
test_get_model_id_components (tests.test_metadata.TestMetadataMethod) ... ok
test_id_to_title (tests.test_metadata.TestMetadataMethod) ... ok
test_apply_metadata_heuristic_from_hf_parameters (tests.test_metadata.TestMetadataMethod) ... ok
test_apply_metadata_heuristic_from_model_card (tests.test_metadata.TestMetadataMethod) ... ok
test_apply_metadata_heuristic_from_model_dir (tests.test_metadata.TestMetadataMethod) ... ok
test_get_model_id_components (tests.test_metadata.TestMetadataMethod) ... ok
test_id_to_title (tests.test_metadata.TestMetadataMethod) ... ok

----------------------------------------------------------------------
Ran 10 tests in 0.003s

OK

Personally reviewed each commit you made. Looks sensible. Interesting that you switched to capital for size label, but not a big deal.

Going to continue to merge soon as your points above still sounds like it could be a later PR anyway. Really appreciated the extra test cases.

@mofosyne mofosyne merged commit 672a6f1 into ggerganov:master Jul 18, 2024
9 checks passed
@MoonRide303
Copy link

MoonRide303 commented Jul 19, 2024

@mofosyne This commit caused model conversion to stop working for me. I had to revert back to b3412 to be able to convert glm-4-9b-chat. And since this change I get:

Traceback (most recent call last):
  File "D:\repos-git\llama.cpp\convert_hf_to_gguf.py", line 3668, in <module>
    main()
  File "D:\repos-git\llama.cpp\convert_hf_to_gguf.py", line 3661, in main
    model_instance.write()
  File "D:\repos-git\llama.cpp\convert_hf_to_gguf.py", line 400, in write
    self.prepare_metadata(vocab_only=False)
  File "D:\repos-git\llama.cpp\convert_hf_to_gguf.py", line 390, in prepare_metadata
    self.set_gguf_parameters()
  File "D:\repos-git\llama.cpp\convert_hf_to_gguf.py", line 3423, in set_gguf_parameters
    self.gguf_writer.add_name(self.hparams["_name_or_path"].split("/")[1]) # THUDM/glm4-9b-chat or THUDM/chatglm3-6b
  File "D:\repos-git\llama.cpp\gguf-py\gguf\gguf_writer.py", line 495, in add_name
    self.add_string(Keys.General.NAME, name)
  File "D:\repos-git\llama.cpp\gguf-py\gguf\gguf_writer.py", line 312, in add_string
    self.add_key_value(key, val, GGUFValueType.STRING)
  File "D:\repos-git\llama.cpp\gguf-py\gguf\gguf_writer.py", line 272, in add_key_value
    raise ValueError(f'Duplicated key name {key!r}')
ValueError: Duplicated key name 'general.name'

Conversion command: convert_hf_to_gguf.py --outtype q8_0 ..\glm-4-9b-chat\ --outfile glm-4-9b-chat-Q8_0.gguf.

@mofosyne
Copy link
Collaborator Author

@MoonRide303

#8590

see if this PR fixes your issue

@mofosyne mofosyne deleted the refactor-convert-py branch July 19, 2024 14:04
@MoonRide303
Copy link

@mofosyne Yeah, 57b1d4f seems to be working fine. 👍

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Jul 27, 2024
…efactor (ggerganov#7499)

Main thing is that the default output filename will take this form

{name}{parameters}{finetune}{version}{encoding}{kind}

In addition this add and remove some entries in the KV store and adds a metadata class with automatic heuristics capability to derive some values based on model card content

* No Change:
  - Internal GGUF Spec
    - `general.architecture`
    - `general.quantization_version`
    - `general.alignment`
    - `general.file_type`
  - General Model Details
    - `general.name`
    - `general.author`
    - `general.version`
    - `general.description`
  - Licensing details
    - `general.license`
  - Typically represents the converted GGUF repo (Unless made from scratch)
    - `general.url`
  - Model Source during conversion
    - `general.source.url`

* Removed:
  - Model Source during conversion
    - `general.source.huggingface.repository`

* Added:
  - General Model Details
    - `general.organization`
    - `general.finetune`
    - `general.basename`
    - `general.quantized_by`
    - `general.size_label`
  - Licensing details
    - `general.license.name`
    - `general.license.link`
  - Typically represents the converted GGUF repo (Unless made from scratch)
    - `general.doi`
    - `general.uuid`
    - `general.repo_url`
  - Model Source during conversion
    - `general.source.doi`
    - `general.source.uuid`
    - `general.source.repo_url`
  - Base Model Source
    - `general.base_model.count`
    - `general.base_model.{id}.name`
    - `general.base_model.{id}.author`
    - `general.base_model.{id}.version`
    - `general.base_model.{id}.organization`
    - `general.base_model.{id}.url` (Model Website/Paper)
    - `general.base_model.{id}.doi`
    - `general.base_model.{id}.uuid`
    - `general.base_model.{id}.repo_url` (Model Source Repository (git/svn/etc...))
  - Array based KV stores
    - `general.tags`
    - `general.languages`
    - `general.datasets`

---------

Co-authored-by: compilade <[email protected]>
Co-authored-by: Xuan Son Nguyen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
devops improvements to build systems and github actions examples help wanted Extra attention is needed merge ready indicates that this may be ready to merge soon and is just holding out in case of objections nix Issues specific to consuming flake.nix, or generally concerned with ❄ Nix-based llama.cpp deployment python python script changes refactoring Refactoring Review Complexity : Medium Generally require more time to grok but manageable by beginner to medium expertise level
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet