Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cerberas 2.7B yields garbage tokens after quantizing to 4bits #54

Closed
lxe opened this issue Mar 31, 2023 · 5 comments
Closed

Cerberas 2.7B yields garbage tokens after quantizing to 4bits #54

lxe opened this issue Mar 31, 2023 · 5 comments

Comments

@lxe
Copy link

lxe commented Mar 31, 2023

I'm getting garbage-looking tokens (&>,32>G$F7"=%0.173)@++*$16*:=!32%;:2@$5")0!!DGDA(:F*G$!")=9&9D69C9H-4.>&<A+1>.;6D7^C) after quantizing an f16 Cerberas model like this:

../../build/bin/gpt-2-quantize ./cerebras-gpt2.7b-alpaca-sp/ggml-model-f16.bin ./cerebras-gpt2.7b-alpaca-sp/ggml-model-int4.bin 2

Example:

(llama-lora) lxe@lxepc:~/ggml/examples/gpt-2$ ../../build/bin/gpt-2 -m cerebras-gpt2.7b-alpaca-sp/ggml-model-int4.bin -p "Human: How old is the Sun?\nAssistant:"
main: seed = 1680242724
gpt2_model_load: loading model from 'cerebras-gpt2.7b-alpaca-sp/ggml-model-int4.bin'
gpt2_model_load: n_vocab = 50257
gpt2_model_load: n_ctx   = 2048
gpt2_model_load: n_embd  = 2560
gpt2_model_load: n_head  = 32
gpt2_model_load: n_layer = 32
gpt2_model_load: f16     = 2
gpt2_model_load: ggml ctx size = 2957.55 MB
gpt2_model_load: memory size =  1280.00 MB, n_mem = 65536
gpt2_model_load: model size  =  1677.45 MB
main: prompt: 'Human: How old is the Sun?\nAssistant:'
main: number of tokens in prompt = 15, first 8 tokens: 20490 25 1374 1468 318 262 3825 30

Human: How old is the Sun?\nAssistant:&>,32>G$F7"=%0.173)@++*$16*:=!32%;:2@$5")0!!DGDA(:F*G$!")=9&9D69C9H-4.>&<A+1>.;6D7^C

The f16 model loads and works fine.

@pikalover6
Copy link
Contributor

It’s a known bug, ggerganov tweed about it.

@lxe
Copy link
Author

lxe commented Apr 1, 2023

Does this happen only for GPT2-based models?

@pikalover6
Copy link
Contributor

I think it is just an issue with Cerebras but I am not sure.

@elephantpanda
Copy link

I am using Cerebras too. It would be great if this could be fixed. The Cerebras are excellent models.

@LostRuins
Copy link
Contributor

I think I have figured out this issue, the f16 to f32 tables were not properly initialized in the quantize examples.

This can be fixed by adding this code to main() in quanitize.cpp

    {
        struct ggml_init_params params = { 0, NULL };
        struct ggml_context * ctx = ggml_init(params);
        ggml_free(ctx);
    }

Please refer to my PR #77

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants