Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

42221 segmentation fault (core dumped) ./mpt #404

Closed
acheong08 opened this issue Jul 21, 2023 · 18 comments
Closed

42221 segmentation fault (core dumped) ./mpt #404

acheong08 opened this issue Jul 21, 2023 · 18 comments

Comments

@acheong08
Copy link

acheong08 commented Jul 21, 2023

 ./mpt -m ~/.models/ggml/mpt-7b-storywriter-ggml_v2-q5_1.bin -n 1000 --repeat-penalty 2 --prompt "..."
...
With this work I do not address myself to strangers, but to those adherents of the
movement who belong to it with their hearts and whose reason now seeks a more
intimate enlightenment. I know that one is able to win people far more by the
spoken than by the written word, and that every great movement on this globe owes
its rise to the great speakers and not to the great writers. 

upSeveral times since I had turned my attention in earnest Upwards�ward������[henyl�g������� ���rs>��������������Solomon��������Ŀ�����v������������������������������������� ������ͷ���������'F���������������or���Ŀ���� �����olidated����ı�����*�(F/����veķ2-��� ǵ������������E õ��������~�Ĺ����K�84��������Ŀ���������}}{(��íĵĵ��������Ŀentieth���/���±��[1]    42221 segmentation fault (core dumped)  ./mpt -m ~/.models/ggml/mpt-7b-storywriter-ggml_v2-q5_1.bin --prompt  -n 1000
@acheong08
Copy link
Author

It returns gibberish after a few tokens and then crashes

@klosax
Copy link
Contributor

klosax commented Jul 21, 2023

Try setting the --ctx-size parameter to 1024. It must be higher than the -n parameter.

@acheong08
Copy link
Author

ON APRIL I, 1924, because of the sentence handed down by the People's Court of
Munich, I had to begin that day, serving my term in the fortress at Landsberg on the
Lech. 
Thus, after years of uninterrupted work, I was afforded for the first time an
opportunity to embark on a task insisted upon by many and felt to be serviceable to
the movement by myself. Therefore, I resolved not only to set forth, in two volumes,
the object of our movement, but also to draw a picture of its development. From
this more can be learned than from any purely doctrinary treatise. 
That also gave me the opportunity to describe my own development, as far as this is
necessary for the understanding of the first as well as the second volume, and which
may serve to destroy the evil legends created about my person by the Jewish press. 
With this work I do not address myself to strangers, but to those adherents of the
movement who belong to it with their hearts and whose reason now seeks a more
intimate enlightenment. I know that one is able to win people far more by the
spoken than by the written word, and that every great movement on this globe owes
its rise to the great speakers and not to the great writers.  Up to date it is necessary to speak mainly of the years 19151919rokee����������������������������������������������������������������������������������������������������������������������������������

@klosax
Copy link
Contributor

klosax commented Jul 21, 2023

Do you still get segmentation fault?

Please paste the whole output including the command you are using.

@goerch
Copy link
Contributor

goerch commented Jul 22, 2023

Yep, something is off. With

./build/bin/release/mpt  -m ggml-model-q4_0.bin -p "I believe the meaning of life is" -t 8 -n 16 -c 20

I see

main: seed      = 1690018637
main: n_threads = 8
main: n_batch   = 8
main: n_ctx     = 20
main: n_predict = 16

mpt_model_load: loading model from 'ggml-model-q4_0.bin' - please wait ...
mpt_model_load: d_model        = 4096
mpt_model_load: max_seq_len    = 65536
mpt_model_load: n_ctx          = 20
mpt_model_load: n_heads        = 32
mpt_model_load: n_layers       = 32
mpt_model_load: n_vocab        = 50432
mpt_model_load: alibi_bias_max = 16.000000
mpt_model_load: clip_qkv       = 6.000000
mpt_model_load: ftype          = 2002
mpt_model_load: qntvr          = 2
mpt_model_load: ggml ctx size = 3577.92 MB
mpt_model_load: memory_size =    10.00 MB, n_mem = 640
mpt_model_load: ........................ done
mpt_model_load: model size =  3567.83 MB / num tensors = 194
extract_tests_from_file : No test file found.
test_gpt_tokenizer : 0 tests failed out of 0 tests.

main: temp           = 0.800
main: top_k          = 50432
main: top_p          = 1.000
main: repeat_last_n  = 64
main: repeat_penalty = 1.020

main: number of tokens in prompt = 7
main: token[0] =     42
main: token[1] =   2868
main: token[2] =    253
main: token[3] =   4495
main: token[4] =    273
main: token[5] =   1495
main: token[6] =    310

I believe the meaning of life is to be true to your own self."

"But to what?"

without timing information (event viewer shows crash) and with

./build/bin/release/mpt  -m ggml-model-q4_0.bin -p "I believe the meaning of life is" -t 8 -n 16 -c 24

everything seems fine

main: seed      = 1690018742
main: n_threads = 8
main: n_batch   = 8
main: n_ctx     = 24
main: n_predict = 16

mpt_model_load: loading model from 'ggml-model-q4_0.bin' - please wait ...
mpt_model_load: d_model        = 4096
mpt_model_load: max_seq_len    = 65536
mpt_model_load: n_ctx          = 24
mpt_model_load: n_heads        = 32
mpt_model_load: n_layers       = 32
mpt_model_load: n_vocab        = 50432
mpt_model_load: alibi_bias_max = 16.000000
mpt_model_load: clip_qkv       = 6.000000
mpt_model_load: ftype          = 2002
mpt_model_load: qntvr          = 2
mpt_model_load: ggml ctx size = 3579.92 MB
mpt_model_load: memory_size =    12.00 MB, n_mem = 768
mpt_model_load: ........................ done
mpt_model_load: model size =  3567.83 MB / num tensors = 194
extract_tests_from_file : No test file found.
test_gpt_tokenizer : 0 tests failed out of 0 tests.

main: temp           = 0.800
main: top_k          = 50432
main: top_p          = 1.000
main: repeat_last_n  = 64
main: repeat_penalty = 1.020

main: number of tokens in prompt = 7
main: token[0] =     42
main: token[1] =   2868
main: token[2] =    253
main: token[3] =   4495
main: token[4] =    273
main: token[5] =   1495
main: token[6] =    310

I believe the meaning of life is to savor each day. To experience all that life has to offer, and


main: sampled tokens =       16
main:  mem per token =   350672 bytes
main:      load time =  4798.70 ms
main:    sample time =   135.10 ms / 8.44 ms per token
main:      eval time =  5917.25 ms / 268.97 ms per token
main:     total time = 11900.16 ms

So probably a problem if the number of tokens to predict is near the context size.

@goerch
Copy link
Contributor

goerch commented Jul 22, 2023

Running in a debugger: crashes in gpt_sample_top_k_top_p_repeat

    {
        const float scale = 1.0f/temp;
        for (int i = 0; i < n_logits; ++i) {
            // repetition penalty from ctrl paper (https://arxiv.org/abs/1909.05858)
            // credit https://github.com/facebookresearch/llama/compare/main...shawwn:llama:main
-->         if (repeat_last_n > 0 && std::find(last_n_tokens.end()-repeat_last_n, last_n_tokens.end(), i) != last_n_tokens.end()) {
                // if score < 0 then repetition penalty has to multiplied to reduce the previous token probability
                if (plogits[i] < 0.0f) {
                    logits_id.push_back(std::make_pair(plogits[i]*scale*repeat_penalty, i));
                } else {
                    logits_id.push_back(std::make_pair(plogits[i]*scale/repeat_penalty, i));
                }
            } else {
                logits_id.push_back(std::make_pair(plogits[i]*scale, i));
            }
        }
    }

due to repeat_last_n being too large. OK, invalid test settings then, back to the drawing board.

@goerch
Copy link
Contributor

goerch commented Jul 22, 2023

Interesting.

./build/bin/release/mpt -m ggml-model-q4_0.bin -p "I believe the meaning of life is" -t 8 -n 16 -c 16  --repeat-last-n 16

crashes in mpt_eval

            {
                struct ggml_tensor * k =
                    ggml_view_1d(ctx0, model.memory_k, N * n_embd,
                                 (ggml_element_size(model.memory_k) * n_embd) * (il * n_ctx + n_past));
-->             struct ggml_tensor * v =
                    ggml_view_1d(ctx0, model.memory_v, N * n_embd,
                                 (ggml_element_size(model.memory_v) * n_embd) * (il * n_ctx + n_past));

                ggml_build_forward_expand(&gf, ggml_cpy(ctx0, Kcur, k));
                ggml_build_forward_expand(&gf, ggml_cpy(ctx0, Vcur, v));
            }

with n_past being 17.

@klosax
Copy link
Contributor

klosax commented Jul 22, 2023

with n_past being 17.

7 prompt tokens + 16 predicted > 16 n_ctx

I think we need to cut down the value of the -n parameter if too high so the tokens wont overflow the ctx.

@goerch
Copy link
Contributor

goerch commented Jul 22, 2023

Replacing

    while (n_sampled < params.n_predict)

with

    while (n_past < params.n_ctx && n_sampled < params.n_predict)

seems to work for me. I checked with a prompt longer than the context size, but didn't consider testing batch size larger than context size.

@acheong08
Copy link
Author

I think we need to cut down the value of the -n parameter if too high so the tokens wont overflow the ctx.

Isn't mpt storywriter meant to be used with large contexts? Is that not possible with ggml?

@goerch
Copy link
Contributor

goerch commented Jul 22, 2023

Isn't mpt storywriter meant to be used with large contexts? Is that not possible with ggml?

It is possible AFAIK. But there exist some restrictions regarding the parameters which maybe are not thoroughly enforced everywhere. Therefore a more complete description of how you are calling mpt would be helpful for a reproduction.

@acheong08
Copy link
Author

./mpt -m ~/.models/ggml/mpt-7b-storywriter-ggml_v2-q5_1.bin --prompt "ON APRIL I, 1924, because of the sentence handed down by the People's Court of
Munich, I had to begin that day, serving my term in the fortress at Landsberg on the
Lech. 
Thus, after years of uninterrupted work, I was afforded for the first time an
opportunity to embark on a task insisted upon by many and felt to be serviceable to
the movement by myself. Therefore, I resolved not only to set forth, in two volumes,
the object of our movement, but also to draw a picture of its development. From
this more can be learned than from any purely doctrinary treatise. 
That also gave me the opportunity to describe my own development, as far as this is
necessary for the understanding of the first as well as the second volume, and which
may serve to destroy the evil legends created about my person by the Jewish press. 
With this work I do not address myself to strangers, but to those adherents of the
movement who belong to it with their hearts and whose reason now seeks a more
intimate enlightenment. I know that one is able to win people far more by the
spoken than by the written word, and that every great movement on this globe owes
its rise to the great speakers and not to the great writers. " -n 1000 --ctx-size 1024

I'm trying to pass the first paragraph of a book

@klosax
Copy link
Contributor

klosax commented Jul 22, 2023

Isn't mpt storywriter meant to be used with large contexts? Is that not possible with ggml?

Yes it is possible to use --ctx-size up to 64k. But be aware that token evaluation time is increasing for each new token predicted.

@klosax
Copy link
Contributor

klosax commented Jul 22, 2023

while (n_past < params.n_ctx && n_sampled < params.n_predict)

Maybe better to restrict n_predict instead. Something like:

if ( n_predict + n_prompt_tokens > n_ctx) {
    n_predict = n_ctx - n_prompt_tokens;
}

@acheong08
Copy link
Author

My issue is that it generates nonsense:

ON APRIL I, 1924, because of the sentence handed down by the People's Court of
Munich, I had to begin that day, serving my term in the fortress at Landsberg on the
Lech. 
Thus, after years of uninterrupted work, I was afforded for the first time an
opportunity to embark on a task insisted upon by many and felt to be serviceable to
the movement by myself. Therefore, I resolved not only to set forth, in two volumes,
the object of our movement, but also to draw a picture of its development. From
this more can be learned than from any purely doctrinary treatise. 
That also gave me the opportunity to describe my own development, as far as this is
necessary for the understanding of the first as well as the second volume, and which
may serve to destroy the evil legends created about my person by the Jewish press. 
With this work I do not address myself to strangers, but to those adherents of the
movement who belong to it with their hearts and whose reason now seeks a more
intimate enlightenment. I know that one is able to win people far more by the
spoken than by the written word, and that every great movement on this globe owes
its rise to the great speakers and not to the great writers.<generation start> up

An [English] translation of the first volume straight from the typescript

The first thing to be done ser� to translate the first part of the book into English as quickly as possible as� as to be able to bring out these two books aqu� in the United States as quickly as possible as� as estaba the purpose of writing estos books as� as ahora mismo they are of great �til. The second volume est� waiting to be translated but this must be done in a more leisurely fashion as� as to allow time to complete the first volume as� as these two volumes est�n a great �til to our movement porque if a book could be published aqu� in America that contained proofs of the rest of the prophecies that est� therein mentioned tambi�n estos books estar�an a great �til to our cause as� as most people ser� settled in their mind tambi�n as� as estar�an inclined to believe m�s ahora mismo if they could see written tambi�n as� as d�a d�a est�n apt to believe more and more that est� so. After the first volume is published estos books estar�an of great �til also to

@acheong08
Copy link
Author

only happens when giving it long context. it can generate that amount just fine without becoming nonsense

@acheong08
Copy link
Author

But considering the core dumped issue has been fixed by --ctx-size, ill close this issue

@klosax
Copy link
Contributor

klosax commented Jul 22, 2023

@acheong08

This line of the output tells you how many tokens there is in the prompt:
main: number of tokens in prompt = X

the -n parameter sets how many tokens to predict = N

now try setting --ctx-size to something higher than X + N

CCLDArjun pushed a commit to CCLDArjun/ggml that referenced this issue Dec 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants