Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFC] Implement a mechanism to detect the type of model being read #147

Open
louisgv opened this issue May 11, 2023 · 7 comments
Open

[RFC] Implement a mechanism to detect the type of model being read #147

louisgv opened this issue May 11, 2023 · 7 comments

Comments

@louisgv
Copy link

louisgv commented May 11, 2023

With all the variant of ML model out now - gpt2/gptneox/llama/gptj, I wonder if theres a way to infer the model's type from reading it?...

Right now, if someone gives me a random model file with obscured name, I'd first need to checksum it, then look up the hash on HF for the model cards, then look through their docs/paper for the model type, and sometime I'd get confused between gptj/gptneox/llama hahah

@danforbes
Copy link
Contributor

This would require some sort of central registry (something simple, just in the GGML source code) that maps uints to model architecture types. It's possible that the GGJT version could be used to convey this information (RWKV is already taking this approach with the "reserved" GGJT version of 100), or a new GGJT version could be introduced that conveys the model architecture ID separately.

@klosax
Copy link
Contributor

klosax commented May 11, 2023

Why not go even further? Make the common infrastructure of llama.cpp become something like "ggml-llm" and the code for the specific llm architectures (llama, gpt-2, gpt-j, mpt and others) become like add-on modules at compile time.

@danforbes
Copy link
Contributor

Why not go even further? Make the common infrastructure of llama.cpp become something like "ggml-llm" and the code for the specific llm architectures (llama, gpt-2, gpt-j, mpt and others) become like add-on modules at compile time.

FWIW that sounds pretty similar to a Rust project I've been contributing to 😅 https://github.com/rustformers/llm

@louisgv
Copy link
Author

louisgv commented May 11, 2023

Haha yeap - I originally proposed the idea in rustformers/llm. I thought it might make sense if there's some kind of metadata within ggml for quick retrieval of that info (?).

@philpax
Copy link
Contributor

philpax commented May 11, 2023

I'm one of the maintainers of llm above - an issue I've noticed is that it's basically impossible to know what architecture you're dealing with, or how it's configured, given an arbitrary GGML file.

The best heuristic I can think of - matching up the tensor names - requires you to be able to locate the tensors, which requires you to skip past the hyperparameters, which requires you to know what hyperparameters to skip past.

Additionally, there are now variants of the same architecture with different configurations; RedPajama uses the GPT-NeoX architecture with use_parallel_residual set to false, while other GPT-NeoX models set it to true. There's no way to encode this information in the current GGML format.

I believe this is an issue that @LostRuins of the koboldcpp project has encountered, too: https://www.reddit.com/r/LocalLLaMA/comments/13bpqro/koboldcpp_added_new_redpajama_neox_support_would/


For the next version of the file format, I suggest replacing the hyperparameters with encoded key/value pairs (the format is up to you, but JSON's always easy), and then including the architecture and any other parameters in there, similar to config.json for HF models: https://huggingface.co/keldenl/RedPajama-INCITE-Chat-7B-v0.1-GGML/blob/main/config.json

This would allow readers to be able to identify the architecture and/or intelligently handle slight discrepancies in format.

@monatis
Copy link
Contributor

monatis commented May 12, 2023

As Far as I know, there's nothing like "ggml file format," as in TensorFlow or Pytorch. It's an arbitrary binary file and, and it's up to you how you implement it. For example, you can do the following:

  1. At the beginning of current binary files add length of the architecture name as integer folowed by the architecture name itself.
  2. In ggml code, first read the name length, read the architecture name accordingly, and then pass the file pointer to the function that can load that particular architecture.

So it does not require a change in ggml code itself, and it can be implemented in user code. Am I missing something?

@philpax
Copy link
Contributor

philpax commented May 12, 2023

There is a semi-formal GGML file format - it's what's produced by the convert-h5-to-ggml.py script here https://github.com/ggerganov/ggml/tree/master/examples/dolly-v2 or the convert.py here: https://github.com/ggerganov/llama.cpp/blob/master/convert.py, or the myriad other scripts that produce it.

There are now four variants of this format, and there are hundreds of GGML-format models floating around on Hugging Face. It is impossible to know what any architecture any of these models are for from the files alone, as their structure is (magic number, binary hyperparameters in fixed order with no identification, tensors).

That is to say - I'm entirely fine encoding the architecture into formats I control, but the GGML format has become somewhat of a standard, and its current iterations are not flexible enough to describe the complexity of the model ecosystem. That should be rectified sooner rather than later.


For reference, we've been discussing what a stable model format would look like here: rustformers/llm#143

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants