Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MPT quantize does not include quantization version #168

Closed
philpax opened this issue May 19, 2023 · 7 comments
Closed

MPT quantize does not include quantization version #168

philpax opened this issue May 19, 2023 · 7 comments

Comments

@philpax
Copy link
Contributor

philpax commented May 19, 2023

Hi there!

The concept of quantization versions was recently introduced by this commit, which encodes the version in the model's ftype: effcfa6

However, it looks like this change didn't make it over to the MPT quantizer:

fout.write((char *)&ftype, sizeof(hparams.ftype));

This means that the MPT models recently uploaded by @TheBloke at https://huggingface.co/TheBloke/MPT-7B-GGML unfortunately do not include the quantization version. The effect of this is mitigated by the MPT example not currently checking the version.

We're checking the version in llm, which is why this has come up. We may add an option to disable the check or override the perceived version, but I figured that it was worth reporting upstream before it proliferates.

@TheBloke
Copy link
Contributor

Thanks for the heads up. I'm happy to re do them with the fixed quantization code.

@TheBloke
Copy link
Contributor

TheBloke commented May 19, 2023

Going to bed now but as soon as the fix is in I can re do them tomorrow

@lukasmoellerch
Copy link
Contributor

Yes, this was merged while I was working on the mpt model integration - I'll contribute a fix

@philpax
Copy link
Contributor Author

philpax commented May 19, 2023

Also worth noting is the code that loads the model needs to account for this (e.g. it needs to modulo the ftype by GGML_QNT_VERSION_FACTOR)

@marella
Copy link
Contributor

marella commented May 20, 2023

@lukasmoellerch I sent #165 to remove global variables. Can you also please include those changes in your fix.

Also max_seq_len is read from model file but doesn't seem to be used anywhere. Is that expected or is it supposed to be used as n_ctx?

@ggerganov
Copy link
Owner

Added quantization version support to MPT and Replit models

@philpax philpax closed this as completed May 20, 2023
@TheBloke
Copy link
Contributor

Thanks very much for the fix. I've updated my three repos:
https://huggingface.co/TheBloke/MPT-7B-GGML
https://huggingface.co/TheBloke/MPT-7B-Instruct-GGML
https://huggingface.co/TheBloke/MPT-7B-Storywriter-GGML

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants