Quantizing GPT-J Produces Nonsense #71

zanussbaum · 2023-04-06T18:47:47Z

Hey thanks for the great package! When I try to quantize an fp16 ggml file of GPT-J, the outputs from chat are nonsense. Also the outputs of the gpt-j-quantize bin seem to be off as I'd expect the hist to have non-zero values (as seen in other examples like llama.cpp quantize)

 0.000 0.000 
                     transformer.h.0.ln_1.weight - [ 4096,     1], type =    f32 size =    0.016 MB
                       transformer.h.0.ln_1.bias - [ 4096,     1], type =    f32 size =    0.016 MB
              transformer.h.0.attn.k_proj.weight - [ 4096,  4096], type =    f16 quantizing .. size =    64.00 MB ->    10.00 MB | hist: 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 
              transformer.h.0.attn.v_proj.weight - [ 4096,  4096], type =    f16 quantizing .. size =    64.00 MB ->    10.00 MB | hist: 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 
              transformer.h.0.attn.q_proj.weight - [ 4096,  4096], type =    f16 quantizing .. size =    64.00 MB ->    10.00 MB | hist: 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 
            transformer.h.0.attn.out_proj.weight - [ 4096,  4096], type =    f16 quantizing .. size =    64.00 MB ->    10.00 MB | hist: 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 
                transformer.h.0.mlp.fc_in.weight - [ 4096, 16384], type =    f16 quantizing .. size =   256.00 MB ->    40.00 MB | hist: 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 
                  transformer.h.0.mlp.fc_in.bias - [16384,     1], type =    f32 size =    0.062 MB
               transformer.h.0.mlp.fc_out.weight - [16384,  4096], type =    f16 quantizing .. size =   256.00 MB ->    40.00 MB | hist: 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 
                 transformer.h.0.mlp.fc_out.bias - [ 4096,     1], type =    f32 size =    0.016 MB
                     transformer.h.1.ln_1.weight - [ 4096,     1], type =    f32 size =    0.016

Is quantizing from fp16 not possible for GPT-J?

The text was updated successfully, but these errors were encountered:

RaymondCrandall · 2023-04-06T22:44:32Z

I think I might not know how to communicate what I am trying to say correctly, but it looks like the output of

transformer.h.0.mlp.fc_in.bias
transformer.h.0.mlp.fc_out.weight

might need their outputs in the opposite order

the comment

ggml/include/ggml/ggml.h

Line 124 in 0bac483

 // The multi-dimensional tensors are stored in row-major order. The ggml_tensor struct contains fields for the 

and the rest of the model structure indicate the model would plausibly be described as [rows,columns]

but maybe I'm wrong or confused

EDIT: definitely wrong and confused.

LostRuins · 2023-04-10T04:33:06Z

Hi @RaymondCrandall @zanussbaum @ggerganov I think I have figured out this issue, the f16 to f32 tables were not properly initialized in the quantize examples.

This can be fixed by adding this code to main() in quanitize.cpp

    {
        struct ggml_init_params params = { 0, NULL };
        struct ggml_context * ctx = ggml_init(params);
        ggml_free(ctx);
    }

Please refer to my PR #77

manyoso · 2023-04-14T13:26:06Z

This was integrated and can be closed, yes?

LostRuins · 2023-04-14T13:47:06Z

It should be. I am already using it in my fork with correct results, it looks like the quantization works. If you see the histogram outputs during quantization, you can tell if they have a bunch of different numbers, then it should be correct.

ggerganov closed this as completed Apr 14, 2023

CCLDArjun pushed a commit to CCLDArjun/ggml that referenced this issue Dec 18, 2023

Inifinite generation via context swapping (ggerganov#71)

e2d490d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quantizing GPT-J Produces Nonsense #71

Quantizing GPT-J Produces Nonsense #71

zanussbaum commented Apr 6, 2023 •

edited

Loading

RaymondCrandall commented Apr 6, 2023 •

edited

Loading

LostRuins commented Apr 10, 2023

manyoso commented Apr 14, 2023

LostRuins commented Apr 14, 2023

Quantizing GPT-J Produces Nonsense #71

Quantizing GPT-J Produces Nonsense #71

Comments

zanussbaum commented Apr 6, 2023 • edited Loading

RaymondCrandall commented Apr 6, 2023 • edited Loading

LostRuins commented Apr 10, 2023

manyoso commented Apr 14, 2023

LostRuins commented Apr 14, 2023

zanussbaum commented Apr 6, 2023 •

edited

Loading

RaymondCrandall commented Apr 6, 2023 •

edited

Loading