Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend ggml format to include a description of the model. #1575

Closed
darxkies opened this issue May 23, 2023 · 14 comments
Closed

Extend ggml format to include a description of the model. #1575

darxkies opened this issue May 23, 2023 · 14 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@darxkies
Copy link

On Hugging Face there are many files called ggml-model-f16.bin or similar. Once downloaded the user can rename them. The information about its origin gets lost. Updating the file becomes difficult when the origin is unknown. It would be easier to extend the ggml format so that the creator can embed a description of the model when generating them using 'quantize'.

@LostRuins
Copy link
Collaborator

not a fan of this idea. Not only would it break all prior formats for little reason again, it would also be unnecessary padding for those who don't need such information. And how large would you make it? GGML is a packed format. It's not like JSON where you can define arbitrary new fields of arbitrary lengths.

My suggestion instead would be to include such metadata either in the file name, or as an accompanying .txt file.

@darxkies
Copy link
Author

I am not very familiar with the structure of the ggml files. I only know that it is a binary format and that it is really very compact.

One way to solve it, without having to change the format entirely, is to just append the text at the end of the ggml file. If the file is longer than it should be, based on the header, then the rest of the file is either ignored or treated as "meta" data. Just an idea.

@darxkies
Copy link
Author

That "blob" at the end of the ggml could be also used to describe the different prompt formats, stop sequences, and so on, that the model supports. ggml wouldn't have to know how to interpret the blob. It would just have to read it and pass it on.

@howard0su
Copy link
Collaborator

Like this idea as there are so many different llama family models with the different prompts. We already have vocab embeded. for sure, we can add other information like prompts and description, even license information.

@philpax
Copy link

philpax commented May 26, 2023

not a fan of this idea. Not only would it break all prior formats for little reason again, it would also be unnecessary padding for those who don't need such information.

I'm a little surprised by your position on this - it's impossible to know what model architecture you're dealing with, or how it should be configured, using the current GGML format, because all the contextual information is lost.

You can't figure out if you have a LLaMA or a GPT-NeoX model without hacks, because the only things you can use to identify that (the tensor names) are located after the hyperparameters, which you need to know the model architecture to read. You can scan through the file for the tensor names as strings, but that's brittle and it's unnecessary.

I'd happily take another format break (as long as it's managed correctly with a migration path!) if it allows all GGML consumers (including koboldcpp and llm etc) to be able to run any arbitrary supported mode, and for future models to extend the information they include without requiring a format break. (Imagine if the inclusion of use_parallel_residual in the GPT-NeoX architecture's hyperparameters didn't create an incompatible variant of the format!)

There's an issue on the GGML repo about this: ggerganov/ggml#147

As for padding... the models are already gigabytes. Adding a few kilobytes is unnoticeable.

And how large would you make it? GGML is a packed format. It's not like JSON where you can define arbitrary new fields of arbitrary lengths.

Just embedding JSON would be an easy fix to this, but that would require a JSON reader/writer to be available which I assume is out of scope.

What I'd propose as an intermediary solution is a very simple binary key value format, where the key is stored as a string with len (same as the tensors), and the value is stored as (type tag, value). These k/v pairs are unordered, so they can be present and in any order, and new hyperparameters can be added without requiring a breaking format change.

Model authors can then use this mechanism to include additional information about source/license/prompting, if they're so inclined.

My suggestion instead would be to include such metadata either in the file name, or as an accompanying .txt file.

That requires model creators to be consistent with filenames and to maintain a filename schema, which is a comparatively large ask of a community compared to just embedding the relevant information in the model itself.

Text files can also be very easily lost, and would need to be structured if they're meant to be consumed by model loaders.

One of the strongest strengths of the GGML format is its one-file-one-model solution; unlike HuggingFace, where you have to clone an entire folder, you can distribute entire models with one file as long as you have a compatible executor. We should make the most of this.

One way to solve it, without having to change the format entirely, is to just append the text at the end of the ggml file. If the file is longer than it should be, based on the header, then the rest of the file is either ignored or treated as "meta" data. Just an idea.

For llm, I was considering embedding this extra contextual information as a U8 tensor containing JSON, but I was concerned about a loader trying to load that faux-tensor as an actual tensor. That isn't a problem for any of the primary executors at present (since they look for specific tensors), but I didn't want to risk it.

@LostRuins
Copy link
Collaborator

@philpax hmm I get your point, but I think it will end up as a https://xkcd.com/927/ situation.

The problem is that such a "free comment field" is by definition arbitrary data. It's a big, unstructured scratch pad that anyone who wants to add their thing will do so - and then as an integrator it becomes even more work since you can't ensure that the field you want exists, and everyone will end up shoving whatever they want into it. If everyone were to use JSON that's already hard but at least gracefully handling an extra field or missing field isn't that difficult.

But data in a packed struct? Already we have one situation like this on the ggml repo where earlier NeoX models do not have the use_parallel_residual field, which ends up being at the ftype offset (because packed struct). I had to resort to some unpleasant hacks for my loader to handle both old and new NeoX formats correctly.

Now imagine if every NeoX author was now adding random fields at their discretion, maybe RedPajama uses the first 100 bytes of this block to add some extra vocab, meanwhile a Pythia enthusiast uses that to store default stochastic sampler values. And then a third party just stores a giant UTF-8 string that contains the huggingface tags for their model cause why not.

Bear in mind that since it's free form data - there's no indicator or no enforced standard, so even the same author might do different things for different versions.

It would be basically unusable.

@philpax
Copy link

philpax commented May 28, 2023

Sorry, meant to get back to you earlier.

I completely agree with you about the mess - I brought up the use_parallel_residual break because it was annoying for us, too.

That's why I'm suggesting that it's structured, and that ordering is irrelevant. That is, instead of storing the hyperparameters as

n_vocab: i32,
n_ctx: i32,
n_embd: i32,
n_head: i32,
n_layer: i32,
n_rot: i32,
use_parallel_residual: bool,
file_type: i32,

it's instead stored as an array of

key_length: u32,
key: [u8; key_length],
value_type: ValueType,
value: raw binary little-endian representation of value

so that you might have

[
  {
    key_length: 6,
    key: 'n_embd',
    value_type: ValueType::I32,
    value: 2560
  },
  {
    key_length: 11,
    key = 'use_parallel_residual',
    value_type = ValueType::Bool,
    value: true
  },
  ...
]

The brackets are for notational convenience - in practice, they're flatpacked and would come after each other in the binary. The ValueType enum would be standardized (like ggml_type), and so would the ways to represent each type of value.

This would allow for the addition of more parameters, readers to be more resilient to models coming from other sources, etc, because you'd be looking up values by key and trying to read them by binary.

It wouldn't be freeform - the storage medium would be entirely structured, so that any reader could pick up data from it without having to know about the other fields. As time goes on, I imagine this would look like ID3v2, with commonly-used tags being standardized by the community for whatever metadata they want to attach.

The main thing I want to achieve is to a) allow the reading of a GGML file knowing nothing else about it, even if you can't do anything with it and b) allow for community model authors to add useful metadata in a way that won't cause breakage for future readers, while still remaining maximally compatible.

@ggerganov
Copy link
Owner

@philpax

I agree with the proposed extension - we should implement it

@cmp-nct
Copy link
Contributor

cmp-nct commented May 28, 2023

In the longer run, cool would be a ggzip package containing:

  • config.json (flat structure with primitive types only)
  • license.txt (all licenses applicable to the model)
  • the weights binary itself, similar to now
    When generated with "mmap support" the zip compression would be 0, that should allow to map the binary 1:1 from within the zip.
    Of course this hypothetical ggzip format would be generated just like gg files are generated now.

The primary benefit of that approach is that, a bit similar to pytorch, a human readable json could define how the weights are to be used.
Especially seeing superior (and legal) open models like Falcon 40B, and completely free models like those of Stability push out "llama" is likely soon a relic.
All those new models need a bit different processing, they need an adaptive or specialized eval loop.
This is going to get worse with more and more superio models in the next months.

@LostRuins
Copy link
Collaborator

LostRuins commented May 29, 2023

I like the flexibility of @philpax suggestion. A few fields should be enforced as mandatory for all models for a model to be considered compliant - the currently existing fields perhaps. Adding a field to indicate the architecture name would be nice too.

If the file header itself changes, maybe we should change the file magic one last time e.g. gguf for g-g-universal-format (actually i don't care it could be anything different, just a random idea that popped up lol.

Some long-reaching considerations would be what to enforce - max length of a key/value? max amount of space reserved for these values? any sort of padding or alignment between elements to reserve space for future uses which we cannot think of right now?

@philpax
Copy link

philpax commented May 29, 2023

Awesome! Yeah, the magic might be worth changing if we make a change this comprehensive, I don't have any particularly strong opinions on that (GGUF / GGJTv4 would be handled the same way for us)

Regarding the considerations: good questions, I have no immediate answers but I'm fine with shipping without. Realistically, I can't imagine the metadata being a large part of the model compared to the tensors, and people who abuse the flexibility in their uploaded models will be policed by the community.

@mgroeber9110
Copy link
Contributor

I stumbled across this issue looking through the "help wanted" tag, and I am wondering: is this issue still relevant, or has the goal been achieved by the switch to GGUF? As far as I understand, the new format follows the principles described here (while #220 appears to be more ambitious and also includes discussions about a unified conversion tool).

@philpax
Copy link

philpax commented Oct 6, 2023

Yes, I would say this is more or less technically complete.

@arnfaldur
Copy link

Since the GGUF file format implements this, this issue is resolved and should be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

8 participants