[RFC] Implement a mechanism to detect the type of model being read #147

louisgv · 2023-05-11T19:55:51Z

With all the variant of ML model out now - gpt2/gptneox/llama/gptj, I wonder if theres a way to infer the model's type from reading it?...

Right now, if someone gives me a random model file with obscured name, I'd first need to checksum it, then look up the hash on HF for the model cards, then look through their docs/paper for the model type, and sometime I'd get confused between gptj/gptneox/llama hahah

danforbes · 2023-05-11T20:02:54Z

This would require some sort of central registry (something simple, just in the GGML source code) that maps uints to model architecture types. It's possible that the GGJT version could be used to convey this information (RWKV is already taking this approach with the "reserved" GGJT version of 100), or a new GGJT version could be introduced that conveys the model architecture ID separately.

klosax · 2023-05-11T20:45:38Z

Why not go even further? Make the common infrastructure of llama.cpp become something like "ggml-llm" and the code for the specific llm architectures (llama, gpt-2, gpt-j, mpt and others) become like add-on modules at compile time.

danforbes · 2023-05-11T20:52:44Z

Why not go even further? Make the common infrastructure of llama.cpp become something like "ggml-llm" and the code for the specific llm architectures (llama, gpt-2, gpt-j, mpt and others) become like add-on modules at compile time.

FWIW that sounds pretty similar to a Rust project I've been contributing to 😅 https://github.com/rustformers/llm

louisgv · 2023-05-11T20:58:10Z

Haha yeap - I originally proposed the idea in rustformers/llm. I thought it might make sense if there's some kind of metadata within ggml for quick retrieval of that info (?).

philpax · 2023-05-11T22:54:37Z

I'm one of the maintainers of llm above - an issue I've noticed is that it's basically impossible to know what architecture you're dealing with, or how it's configured, given an arbitrary GGML file.

The best heuristic I can think of - matching up the tensor names - requires you to be able to locate the tensors, which requires you to skip past the hyperparameters, which requires you to know what hyperparameters to skip past.

Additionally, there are now variants of the same architecture with different configurations; RedPajama uses the GPT-NeoX architecture with use_parallel_residual set to false, while other GPT-NeoX models set it to true. There's no way to encode this information in the current GGML format.

I believe this is an issue that @LostRuins of the koboldcpp project has encountered, too: https://www.reddit.com/r/LocalLLaMA/comments/13bpqro/koboldcpp_added_new_redpajama_neox_support_would/

For the next version of the file format, I suggest replacing the hyperparameters with encoded key/value pairs (the format is up to you, but JSON's always easy), and then including the architecture and any other parameters in there, similar to config.json for HF models: https://huggingface.co/keldenl/RedPajama-INCITE-Chat-7B-v0.1-GGML/blob/main/config.json

This would allow readers to be able to identify the architecture and/or intelligently handle slight discrepancies in format.

monatis · 2023-05-12T00:59:46Z

As Far as I know, there's nothing like "ggml file format," as in TensorFlow or Pytorch. It's an arbitrary binary file and, and it's up to you how you implement it. For example, you can do the following:

At the beginning of current binary files add length of the architecture name as integer folowed by the architecture name itself.
In ggml code, first read the name length, read the architecture name accordingly, and then pass the file pointer to the function that can load that particular architecture.

So it does not require a change in ggml code itself, and it can be implemented in user code. Am I missing something?

philpax · 2023-05-12T14:08:58Z

There is a semi-formal GGML file format - it's what's produced by the convert-h5-to-ggml.py script here https://github.com/ggerganov/ggml/tree/master/examples/dolly-v2 or the convert.py here: https://github.com/ggerganov/llama.cpp/blob/master/convert.py, or the myriad other scripts that produce it.

There are now four variants of this format, and there are hundreds of GGML-format models floating around on Hugging Face. It is impossible to know what any architecture any of these models are for from the files alone, as their structure is (magic number, binary hyperparameters in fixed order with no identification, tensors).

That is to say - I'm entirely fine encoding the architecture into formats I control, but the GGML format has become somewhat of a standard, and its current iterations are not flexible enough to describe the complexity of the model ecosystem. That should be rectified sooner rather than later.

For reference, we've been discussing what a stable model format would look like here: rustformers/llm#143

LostRuins mentioned this issue May 12, 2023

preemptive request : regarding possible bit shuffling sync (just in case) #150

Open

This was referenced May 26, 2023

Extend ggml format to include a description of the model. ggerganov/llama.cpp#1575

Closed

Quantization does not write the quantization version to ftype ggerganov/llama.cpp#1590

Closed

ggml : unified file format #220

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Implement a mechanism to detect the type of model being read #147

[RFC] Implement a mechanism to detect the type of model being read #147

louisgv commented May 11, 2023

danforbes commented May 11, 2023

klosax commented May 11, 2023

danforbes commented May 11, 2023

louisgv commented May 11, 2023

philpax commented May 11, 2023

monatis commented May 12, 2023

philpax commented May 12, 2023 •

edited

Loading

[RFC] Implement a mechanism to detect the type of model being read #147

[RFC] Implement a mechanism to detect the type of model being read #147

Comments

louisgv commented May 11, 2023

danforbes commented May 11, 2023

klosax commented May 11, 2023

danforbes commented May 11, 2023

louisgv commented May 11, 2023

philpax commented May 11, 2023

monatis commented May 12, 2023

philpax commented May 12, 2023 • edited Loading

philpax commented May 12, 2023 •

edited

Loading