Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPTNeoX model with 16k context results in context-size related issues: ggml_new_tensor_impl: not enough space in the context's memory pool, and instant core dump with fp16 #225

Open
TheBloke opened this issue Jun 3, 2023 · 2 comments

Comments

@TheBloke
Copy link
Contributor

TheBloke commented Jun 3, 2023

Hey guys

Today I was doing quants of a new GPTNeoX model called Literature-7B-16384

I tried making GGMLs through the usual process:

python examples/gpt-neox/convert-h5-to-ggml.py /workspace/models/hakurei_Literature-7B-16384 0
build/bin/gpt-neox-quantize /workspace/process/literature-7b/ggml/ggml-model-f32.bin /workspace/process/literature-7b/ggml/literature-7b-16384.gptneox.ggmlv3.q4_0.bin q4_0

Both steps completed fine. But the models can't be used.

Trying to use the fp32:

[pytorch2] ubuntu@h100:/workspace/git/ggml git:(master) $ build/bin/gpt-neox -m /workspace/process/literature-7b/ggml/ggml-model-f32.bin   -p "test"
main: seed = 1685827980
gpt_neox_model_load: loading model from '/workspace/process/literature-7b/ggml/ggml-model-f32.bin' - please wait ...
gpt_neox_model_load: n_vocab = 50432
gpt_neox_model_load: n_ctx   = 16384
gpt_neox_model_load: n_embd  = 4096
gpt_neox_model_load: n_head  = 32
gpt_neox_model_load: n_layer = 32
gpt_neox_model_load: n_rot   = 128
gpt_neox_model_load: par_res = 0
gpt_neox_model_load: ftype   = 0
gpt_neox_model_load: qntvr   = 0
gpt_neox_model_load: ggml ctx size = 11822.28 MB
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 12661385472, available 12396563456)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 12661451264, available 12396563456)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 12594392320, available 12396563456)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 12460224000, available 12396563456)
.... lots of similar lines removed ...
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 16691523072, available 12396563456)
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 16691523072, available 12396563456)
[1]    1178727 segmentation fault (core dumped)  build/bin/gpt-neox -m /workspace/process/literature-7b/ggml/ggml-model-f32.bi
[pytorch2] ubuntu@h100:/workspace/git/ggml git:(master) $

Trying an fp16 conversion instead is even more spectacular:

[pytorch2] ubuntu@h100:/workspace/git/ggml git:(master) $ build/bin/gpt-neox -m /workspace/process/literature-7b/ggml/ggml-model-f16.bin -n 100  -p "test"
main: seed = 1685827752
gpt_neox_model_load: loading model from '/workspace/process/literature-7b/ggml/ggml-model-f16.bin' - please wait ...
gpt_neox_model_load: n_vocab = 50432
gpt_neox_model_load: n_ctx   = 16384
gpt_neox_model_load: n_embd  = 4096
gpt_neox_model_load: n_head  = 32
gpt_neox_model_load: n_layer = 32
gpt_neox_model_load: n_rot   = 128
gpt_neox_model_load: par_res = 0
gpt_neox_model_load: ftype   = 1
gpt_neox_model_load: qntvr   = 0
gpt_neox_model_load: ggml ctx size = 17592186043162.29 MB
GGML_ASSERT: /workspace/git/ggml/src/ggml.c:3982: ctx->mem_buffer != NULL
[1]    1178038 abort (core dumped)  build/bin/gpt-neox -m /workspace/process/literature-7b/ggml/ggml-model-f16.bi

And then trying a quantised version made from either fp32 or fp16 gives the same errors as with the fp32:

ggml_new_tensor_impl: not enough space in the context's memory pool (needed 116195584, available 265216)

I tried various -n values with both files but that made no difference.

I assume it's because some support needs to be made for the unusually large context size? I have previously tested GPTNeoX models with 4k and 8k context and they seemed to work.

I don't know if this is a bug or a feature request, but I thought I'd let you guys know. Let me know if you'd like me to upload the fp16, fp32 or q4_0 GGMLs anywhere for inspection.

Thanks in advance!

@klosax
Copy link
Contributor

klosax commented Jun 3, 2023

gpt_neox_model_load: ggml ctx size = 17592186043162.29 MB

It seems to be a calculation error with signed and unsigned integers.

Change int to size_t in these lines:

        const int n_embd  = hparams.n_embd;
        const int n_layer = hparams.n_layer;
        const int n_ctx   = hparams.n_ctx;
        const int n_vocab = hparams.n_vocab;

to

        const size_t n_embd  = hparams.n_embd;
        const size_t n_layer = hparams.n_layer;
        const size_t n_ctx   = hparams.n_ctx;
        const size_t n_vocab = hparams.n_vocab;

Working on q8_0 quantization:

./main -m litterature-7b-q8_0.bin 
main: seed = 1685837187
gpt_neox_model_load: loading model from 'litterature-7b-q8_0.bin' - please wait ...
gpt_neox_model_load: n_vocab = 50432
gpt_neox_model_load: n_ctx   = 16384
gpt_neox_model_load: n_embd  = 4096
gpt_neox_model_load: n_head  = 32
gpt_neox_model_load: n_layer = 32
gpt_neox_model_load: n_rot   = 128
gpt_neox_model_load: par_res = 0
gpt_neox_model_load: ftype   = 2007
gpt_neox_model_load: qntvr   = 2
gpt_neox_model_load: ggml ctx size = 25384.91 MB
gpt_neox_model_load: memory_size =  8192.00 MB, n_mem = 524288
gpt_neox_model_load: ................................................ done
gpt_neox_model_load: model size =  6953.16 MB / num tensors = 388
extract_tests_from_file : No test file found.
test_gpt_tokenizer : 0 tests failed out of 0 tests.
main: number of tokens in prompt = 1
main: token[0] =   3726, They

They-the-heavens! I've been sitting here, and he's never come back!^C

@TheBloke
Copy link
Contributor Author

TheBloke commented Jun 4, 2023

Thanks so much! i will test and close this shortly

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants