Cerberas 2.7B yields garbage tokens after quantizing to 4bits #54

lxe · 2023-03-31T06:09:26Z

I'm getting garbage-looking tokens (&>,32>G$F7"=%0.173)@++*$16*:=!32%;:2@$5")0!!DGDA(:F*G$!")=9&9D69C9H-4.>&<A+1>.;6D7^C) after quantizing an f16 Cerberas model like this:

../../build/bin/gpt-2-quantize ./cerebras-gpt2.7b-alpaca-sp/ggml-model-f16.bin ./cerebras-gpt2.7b-alpaca-sp/ggml-model-int4.bin 2

Example:

(llama-lora) lxe@lxepc:~/ggml/examples/gpt-2$ ../../build/bin/gpt-2 -m cerebras-gpt2.7b-alpaca-sp/ggml-model-int4.bin -p "Human: How old is the Sun?\nAssistant:"
main: seed = 1680242724
gpt2_model_load: loading model from 'cerebras-gpt2.7b-alpaca-sp/ggml-model-int4.bin'
gpt2_model_load: n_vocab = 50257
gpt2_model_load: n_ctx   = 2048
gpt2_model_load: n_embd  = 2560
gpt2_model_load: n_head  = 32
gpt2_model_load: n_layer = 32
gpt2_model_load: f16     = 2
gpt2_model_load: ggml ctx size = 2957.55 MB
gpt2_model_load: memory size =  1280.00 MB, n_mem = 65536
gpt2_model_load: model size  =  1677.45 MB
main: prompt: 'Human: How old is the Sun?\nAssistant:'
main: number of tokens in prompt = 15, first 8 tokens: 20490 25 1374 1468 318 262 3825 30

Human: How old is the Sun?\nAssistant:&>,32>G$F7"=%0.173)@++*$16*:=!32%;:2@$5")0!!DGDA(:F*G$!")=9&9D69C9H-4.>&<A+1>.;6D7^C

The f16 model loads and works fine.

The text was updated successfully, but these errors were encountered:

pikalover6 · 2023-03-31T19:10:25Z

It’s a known bug, ggerganov tweed about it.

lxe · 2023-04-01T07:36:12Z

Does this happen only for GPT2-based models?

pikalover6 · 2023-04-01T19:48:18Z

I think it is just an issue with Cerebras but I am not sure.

elephantpanda · 2023-04-02T20:17:51Z

I am using Cerebras too. It would be great if this could be fixed. The Cerebras are excellent models.

LostRuins · 2023-04-10T04:33:32Z

I think I have figured out this issue, the f16 to f32 tables were not properly initialized in the quantize examples.

This can be fixed by adding this code to main() in quanitize.cpp

    {
        struct ggml_init_params params = { 0, NULL };
        struct ggml_context * ctx = ggml_init(params);
        ggml_free(ctx);
    }

Please refer to my PR #77

ggerganov closed this as completed Apr 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cerberas 2.7B yields garbage tokens after quantizing to 4bits #54

Cerberas 2.7B yields garbage tokens after quantizing to 4bits #54

lxe commented Mar 31, 2023

pikalover6 commented Mar 31, 2023

lxe commented Apr 1, 2023

pikalover6 commented Apr 1, 2023

elephantpanda commented Apr 2, 2023

LostRuins commented Apr 10, 2023

Cerberas 2.7B yields garbage tokens after quantizing to 4bits #54

Cerberas 2.7B yields garbage tokens after quantizing to 4bits #54

Comments

lxe commented Mar 31, 2023

pikalover6 commented Mar 31, 2023

lxe commented Apr 1, 2023

pikalover6 commented Apr 1, 2023

elephantpanda commented Apr 2, 2023

LostRuins commented Apr 10, 2023