42221 segmentation fault (core dumped) ./mpt #404

acheong08 · 2023-07-21T10:42:04Z

 ./mpt -m ~/.models/ggml/mpt-7b-storywriter-ggml_v2-q5_1.bin -n 1000 --repeat-penalty 2 --prompt "..."

...
With this work I do not address myself to strangers, but to those adherents of the
movement who belong to it with their hearts and whose reason now seeks a more
intimate enlightenment. I know that one is able to win people far more by the
spoken than by the written word, and that every great movement on this globe owes
its rise to the great speakers and not to the great writers. 

upSeveral times since I had turned my attention in earnest Upwards�ward������[henyl�g������� ���rs>��������������Solomon��������Ŀ�����v������������������������������������� ������ͷ���������'F���������������or���Ŀ���� �����olidated����ı�����*�(F/����veķ2-��� ǵ������������E õ��������~�Ĺ����K�84��������Ŀ���������}}{(��íĵĵ��������Ŀentieth���/���±��[1]    42221 segmentation fault (core dumped)  ./mpt -m ~/.models/ggml/mpt-7b-storywriter-ggml_v2-q5_1.bin --prompt  -n 1000

The text was updated successfully, but these errors were encountered:

acheong08 · 2023-07-21T10:42:42Z

It returns gibberish after a few tokens and then crashes

klosax · 2023-07-21T11:21:09Z

Try setting the --ctx-size parameter to 1024. It must be higher than the -n parameter.

acheong08 · 2023-07-21T15:23:39Z

ON APRIL I, 1924, because of the sentence handed down by the People's Court of
Munich, I had to begin that day, serving my term in the fortress at Landsberg on the
Lech. 
Thus, after years of uninterrupted work, I was afforded for the first time an
opportunity to embark on a task insisted upon by many and felt to be serviceable to
the movement by myself. Therefore, I resolved not only to set forth, in two volumes,
the object of our movement, but also to draw a picture of its development. From
this more can be learned than from any purely doctrinary treatise. 
That also gave me the opportunity to describe my own development, as far as this is
necessary for the understanding of the first as well as the second volume, and which
may serve to destroy the evil legends created about my person by the Jewish press. 
With this work I do not address myself to strangers, but to those adherents of the
movement who belong to it with their hearts and whose reason now seeks a more
intimate enlightenment. I know that one is able to win people far more by the
spoken than by the written word, and that every great movement on this globe owes
its rise to the great speakers and not to the great writers.  Up to date it is necessary to speak mainly of the years 19151919rokee����������������������������������������������������������������������������������������������������������������������������������

klosax · 2023-07-21T17:17:59Z

Do you still get segmentation fault?

Please paste the whole output including the command you are using.

goerch · 2023-07-22T09:43:34Z

Yep, something is off. With

./build/bin/release/mpt  -m ggml-model-q4_0.bin -p "I believe the meaning of life is" -t 8 -n 16 -c 20

I see

main: seed      = 1690018637
main: n_threads = 8
main: n_batch   = 8
main: n_ctx     = 20
main: n_predict = 16

mpt_model_load: loading model from 'ggml-model-q4_0.bin' - please wait ...
mpt_model_load: d_model        = 4096
mpt_model_load: max_seq_len    = 65536
mpt_model_load: n_ctx          = 20
mpt_model_load: n_heads        = 32
mpt_model_load: n_layers       = 32
mpt_model_load: n_vocab        = 50432
mpt_model_load: alibi_bias_max = 16.000000
mpt_model_load: clip_qkv       = 6.000000
mpt_model_load: ftype          = 2002
mpt_model_load: qntvr          = 2
mpt_model_load: ggml ctx size = 3577.92 MB
mpt_model_load: memory_size =    10.00 MB, n_mem = 640
mpt_model_load: ........................ done
mpt_model_load: model size =  3567.83 MB / num tensors = 194
extract_tests_from_file : No test file found.
test_gpt_tokenizer : 0 tests failed out of 0 tests.

main: temp           = 0.800
main: top_k          = 50432
main: top_p          = 1.000
main: repeat_last_n  = 64
main: repeat_penalty = 1.020

main: number of tokens in prompt = 7
main: token[0] =     42
main: token[1] =   2868
main: token[2] =    253
main: token[3] =   4495
main: token[4] =    273
main: token[5] =   1495
main: token[6] =    310

I believe the meaning of life is to be true to your own self."

"But to what?"

without timing information (event viewer shows crash) and with

./build/bin/release/mpt  -m ggml-model-q4_0.bin -p "I believe the meaning of life is" -t 8 -n 16 -c 24

everything seems fine

main: seed      = 1690018742
main: n_threads = 8
main: n_batch   = 8
main: n_ctx     = 24
main: n_predict = 16

mpt_model_load: loading model from 'ggml-model-q4_0.bin' - please wait ...
mpt_model_load: d_model        = 4096
mpt_model_load: max_seq_len    = 65536
mpt_model_load: n_ctx          = 24
mpt_model_load: n_heads        = 32
mpt_model_load: n_layers       = 32
mpt_model_load: n_vocab        = 50432
mpt_model_load: alibi_bias_max = 16.000000
mpt_model_load: clip_qkv       = 6.000000
mpt_model_load: ftype          = 2002
mpt_model_load: qntvr          = 2
mpt_model_load: ggml ctx size = 3579.92 MB
mpt_model_load: memory_size =    12.00 MB, n_mem = 768
mpt_model_load: ........................ done
mpt_model_load: model size =  3567.83 MB / num tensors = 194
extract_tests_from_file : No test file found.
test_gpt_tokenizer : 0 tests failed out of 0 tests.

main: temp           = 0.800
main: top_k          = 50432
main: top_p          = 1.000
main: repeat_last_n  = 64
main: repeat_penalty = 1.020

main: number of tokens in prompt = 7
main: token[0] =     42
main: token[1] =   2868
main: token[2] =    253
main: token[3] =   4495
main: token[4] =    273
main: token[5] =   1495
main: token[6] =    310

I believe the meaning of life is to savor each day. To experience all that life has to offer, and


main: sampled tokens =       16
main:  mem per token =   350672 bytes
main:      load time =  4798.70 ms
main:    sample time =   135.10 ms / 8.44 ms per token
main:      eval time =  5917.25 ms / 268.97 ms per token
main:     total time = 11900.16 ms

So probably a problem if the number of tokens to predict is near the context size.

goerch · 2023-07-22T09:52:06Z

Running in a debugger: crashes in gpt_sample_top_k_top_p_repeat

    {
        const float scale = 1.0f/temp;
        for (int i = 0; i < n_logits; ++i) {
            // repetition penalty from ctrl paper (https://arxiv.org/abs/1909.05858)
            // credit https://github.com/facebookresearch/llama/compare/main...shawwn:llama:main
-->         if (repeat_last_n > 0 && std::find(last_n_tokens.end()-repeat_last_n, last_n_tokens.end(), i) != last_n_tokens.end()) {
                // if score < 0 then repetition penalty has to multiplied to reduce the previous token probability
                if (plogits[i] < 0.0f) {
                    logits_id.push_back(std::make_pair(plogits[i]*scale*repeat_penalty, i));
                } else {
                    logits_id.push_back(std::make_pair(plogits[i]*scale/repeat_penalty, i));
                }
            } else {
                logits_id.push_back(std::make_pair(plogits[i]*scale, i));
            }
        }
    }

due to repeat_last_n being too large. OK, invalid test settings then, back to the drawing board.

goerch · 2023-07-22T10:04:54Z

Interesting.

./build/bin/release/mpt -m ggml-model-q4_0.bin -p "I believe the meaning of life is" -t 8 -n 16 -c 16  --repeat-last-n 16

crashes in mpt_eval

            {
                struct ggml_tensor * k =
                    ggml_view_1d(ctx0, model.memory_k, N * n_embd,
                                 (ggml_element_size(model.memory_k) * n_embd) * (il * n_ctx + n_past));
-->             struct ggml_tensor * v =
                    ggml_view_1d(ctx0, model.memory_v, N * n_embd,
                                 (ggml_element_size(model.memory_v) * n_embd) * (il * n_ctx + n_past));

                ggml_build_forward_expand(&gf, ggml_cpy(ctx0, Kcur, k));
                ggml_build_forward_expand(&gf, ggml_cpy(ctx0, Vcur, v));
            }

with n_past being 17.

klosax · 2023-07-22T10:18:00Z

with n_past being 17.

7 prompt tokens + 16 predicted > 16 n_ctx

I think we need to cut down the value of the -n parameter if too high so the tokens wont overflow the ctx.

goerch · 2023-07-22T11:02:51Z

Replacing

    while (n_sampled < params.n_predict)

with

    while (n_past < params.n_ctx && n_sampled < params.n_predict)

seems to work for me. I checked with a prompt longer than the context size, but didn't consider testing batch size larger than context size.

acheong08 · 2023-07-22T11:13:47Z

I think we need to cut down the value of the -n parameter if too high so the tokens wont overflow the ctx.

Isn't mpt storywriter meant to be used with large contexts? Is that not possible with ggml?

goerch · 2023-07-22T12:08:46Z

Isn't mpt storywriter meant to be used with large contexts? Is that not possible with ggml?

It is possible AFAIK. But there exist some restrictions regarding the parameters which maybe are not thoroughly enforced everywhere. Therefore a more complete description of how you are calling mpt would be helpful for a reproduction.

acheong08 · 2023-07-22T12:54:53Z

./mpt -m ~/.models/ggml/mpt-7b-storywriter-ggml_v2-q5_1.bin --prompt "ON APRIL I, 1924, because of the sentence handed down by the People's Court of
Munich, I had to begin that day, serving my term in the fortress at Landsberg on the
Lech. 
Thus, after years of uninterrupted work, I was afforded for the first time an
opportunity to embark on a task insisted upon by many and felt to be serviceable to
the movement by myself. Therefore, I resolved not only to set forth, in two volumes,
the object of our movement, but also to draw a picture of its development. From
this more can be learned than from any purely doctrinary treatise. 
That also gave me the opportunity to describe my own development, as far as this is
necessary for the understanding of the first as well as the second volume, and which
may serve to destroy the evil legends created about my person by the Jewish press. 
With this work I do not address myself to strangers, but to those adherents of the
movement who belong to it with their hearts and whose reason now seeks a more
intimate enlightenment. I know that one is able to win people far more by the
spoken than by the written word, and that every great movement on this globe owes
its rise to the great speakers and not to the great writers. " -n 1000 --ctx-size 1024

I'm trying to pass the first paragraph of a book

klosax · 2023-07-22T13:55:53Z

Isn't mpt storywriter meant to be used with large contexts? Is that not possible with ggml?

Yes it is possible to use --ctx-size up to 64k. But be aware that token evaluation time is increasing for each new token predicted.

klosax · 2023-07-22T14:01:07Z

while (n_past < params.n_ctx && n_sampled < params.n_predict)

Maybe better to restrict n_predict instead. Something like:

if ( n_predict + n_prompt_tokens > n_ctx) {
    n_predict = n_ctx - n_prompt_tokens;
}

acheong08 · 2023-07-22T15:46:11Z

My issue is that it generates nonsense:

ON APRIL I, 1924, because of the sentence handed down by the People's Court of
Munich, I had to begin that day, serving my term in the fortress at Landsberg on the
Lech. 
Thus, after years of uninterrupted work, I was afforded for the first time an
opportunity to embark on a task insisted upon by many and felt to be serviceable to
the movement by myself. Therefore, I resolved not only to set forth, in two volumes,
the object of our movement, but also to draw a picture of its development. From
this more can be learned than from any purely doctrinary treatise. 
That also gave me the opportunity to describe my own development, as far as this is
necessary for the understanding of the first as well as the second volume, and which
may serve to destroy the evil legends created about my person by the Jewish press. 
With this work I do not address myself to strangers, but to those adherents of the
movement who belong to it with their hearts and whose reason now seeks a more
intimate enlightenment. I know that one is able to win people far more by the
spoken than by the written word, and that every great movement on this globe owes
its rise to the great speakers and not to the great writers.<generation start> up

An [English] translation of the first volume straight from the typescript

The first thing to be done ser� to translate the first part of the book into English as quickly as possible as� as to be able to bring out these two books aqu� in the United States as quickly as possible as� as estaba the purpose of writing estos books as� as ahora mismo they are of great �til. The second volume est� waiting to be translated but this must be done in a more leisurely fashion as� as to allow time to complete the first volume as� as these two volumes est�n a great �til to our movement porque if a book could be published aqu� in America that contained proofs of the rest of the prophecies that est� therein mentioned tambi�n estos books estar�an a great �til to our cause as� as most people ser� settled in their mind tambi�n as� as estar�an inclined to believe m�s ahora mismo if they could see written tambi�n as� as d�a d�a est�n apt to believe more and more that est� so. After the first volume is published estos books estar�an of great �til also to

acheong08 · 2023-07-22T15:51:46Z

only happens when giving it long context. it can generate that amount just fine without becoming nonsense

acheong08 · 2023-07-22T15:52:18Z

But considering the core dumped issue has been fixed by --ctx-size, ill close this issue

klosax · 2023-07-22T15:53:56Z

@acheong08

This line of the output tells you how many tokens there is in the prompt:
main: number of tokens in prompt = X

the -n parameter sets how many tokens to predict = N

now try setting --ctx-size to something higher than X + N

acheong08 closed this as completed Jul 22, 2023

CCLDArjun pushed a commit to CCLDArjun/ggml that referenced this issue Dec 18, 2023

Update issue template so people will use it (ggerganov#404)

ee8a788

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

42221 segmentation fault (core dumped) ./mpt #404

42221 segmentation fault (core dumped) ./mpt #404

acheong08 commented Jul 21, 2023 •

edited

Loading

acheong08 commented Jul 21, 2023

klosax commented Jul 21, 2023

acheong08 commented Jul 21, 2023

klosax commented Jul 21, 2023

goerch commented Jul 22, 2023 •

edited

Loading

goerch commented Jul 22, 2023 •

edited

Loading

goerch commented Jul 22, 2023

klosax commented Jul 22, 2023

goerch commented Jul 22, 2023

acheong08 commented Jul 22, 2023

goerch commented Jul 22, 2023

acheong08 commented Jul 22, 2023

klosax commented Jul 22, 2023

klosax commented Jul 22, 2023

acheong08 commented Jul 22, 2023

acheong08 commented Jul 22, 2023

acheong08 commented Jul 22, 2023

klosax commented Jul 22, 2023

42221 segmentation fault (core dumped) ./mpt #404

42221 segmentation fault (core dumped) ./mpt #404

Comments

acheong08 commented Jul 21, 2023 • edited Loading

acheong08 commented Jul 21, 2023

klosax commented Jul 21, 2023

acheong08 commented Jul 21, 2023

klosax commented Jul 21, 2023

goerch commented Jul 22, 2023 • edited Loading

goerch commented Jul 22, 2023 • edited Loading

goerch commented Jul 22, 2023

klosax commented Jul 22, 2023

goerch commented Jul 22, 2023

acheong08 commented Jul 22, 2023

goerch commented Jul 22, 2023

acheong08 commented Jul 22, 2023

klosax commented Jul 22, 2023

klosax commented Jul 22, 2023

acheong08 commented Jul 22, 2023

acheong08 commented Jul 22, 2023

acheong08 commented Jul 22, 2023

klosax commented Jul 22, 2023

acheong08 commented Jul 21, 2023 •

edited

Loading

goerch commented Jul 22, 2023 •

edited

Loading

goerch commented Jul 22, 2023 •

edited

Loading