starcoder -- not enough space in the context's memory pool #158

bluecoconut · 2023-05-16T07:58:10Z

I'm getting errors with starcoder models when I try to include any non-trivial amount of tokens. I'm getting this with both my raw model (direct .bin) and quantized model regardless of version (pre Q4/Q5 changes and post Q4/Q5 changes).

Relevant error:

ggml_new_tensor_impl: not enough space in the context's memory pool (needed 412241472, available 411790368)

Example:

./build/bin/starcoder -m /workspaces/research/models/starcoder/starcoder-ggml.bin -p "def fibo( fibo fib fibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test " --top_k 0 --top_p 0.95 --temp 0.2

will cause the error

main: seed = 1684223471
starcoder_model_load: loading model from '/workspaces/research/models/starcoder/starcoder-ggml.bin'
starcoder_model_load: n_vocab = 49152
starcoder_model_load: n_ctx   = 8192
starcoder_model_load: n_embd  = 6144
starcoder_model_load: n_head  = 48
starcoder_model_load: n_layer = 40
starcoder_model_load: ftype   = 1
starcoder_model_load: qntvr   = 0
starcoder_model_load: ggml ctx size = 51276.47 MB
starcoder_model_load: memory size = 15360.00 MB, n_mem = 327680
starcoder_model_load: model size  = 35916.23 MB
main: prompt: 'def fibo( fibo fib fibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test '
main: number of tokens in prompt = 51, first 8 tokens: 589 28176 97 26 28176 97 28176 28176 

def fibo( fibo fib fibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibggml_new_tensor_impl: not enough space in the context's memory pool (needed 412241472, available 411952576)
Segmentation fault (core dumped)

(Here's another output from the quantized model)

vscode ➜ /workspaces/research/others/ggml (master) $ ./build/bin/starcoder -m /workspaces/research/models/starcoder/starcoder-ggml-q4_1.bin -p "def fibo( fibo fib fibo test wate
rfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test " --top_k 0 --top_p 0.95 --temp 0.2 
main: seed = 1684223600
starcoder_model_load: loading model from '/workspaces/research/models/starcoder/starcoder-ggml-q4_1.bin'
starcoder_model_load: n_vocab = 49152
starcoder_model_load: n_ctx   = 8192
starcoder_model_load: n_embd  = 6144
starcoder_model_load: n_head  = 48
starcoder_model_load: n_layer = 40
starcoder_model_load: ftype   = 1003
starcoder_model_load: qntvr   = 1
starcoder_model_load: ggml ctx size = 28956.47 MB
starcoder_model_load: memory size = 15360.00 MB, n_mem = 327680
starcoder_model_load: model size  = 13596.23 MB
main: prompt: 'def fibo( fibo fib fibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test '
main: number of tokens in prompt = 51, first 8 tokens: 589 28176 97 26 28176 97 28176 28176 

def fibo( fibo fib fibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibggml_new_tensor_impl: not enough space in the context's memory pool (needed 412241472, available 411790368)
Segmentation fault (core dumped)

Best I can find in the past was ggerganov/llama.cpp#29

But, maybe that was fixed in llama models, but the problem has returned for starcoder?

Based on: #146

Specifically hoping that @NouamaneTazi might have some clarity on why this might be happening?

The text was updated successfully, but these errors were encountered:

NouamaneTazi · 2023-05-16T08:56:56Z

Interesting find! Thank you for raising this. Two questions:

Does this happen with santacoder model also or just starcoder?
Can you try using this repo https://github.com/bigcode-project/starcoder.cpp where I used ggml files from the llama.cpp repo?

bluecoconut · 2023-05-16T09:20:44Z

Just tried santacoder and it does seem to have the same problem, but at a very different scale. (Error is the same) (had to put in >700, maybe around 1000 tokens or so... so this might just be normal? context length issues?)

example code I used to test santacoder (note, this isn't directly on ggml executable, but through ctransformers, but, same errors show up as shown in the original post, where i directly just use the compiled ./starcoder, so i think it's safe to say that it'd behave the same on the underlying ggml)

Python 3.10.11 (main, Apr 12 2023, 14:46:22) [GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import lambdaprompt as lp
>>> import os
>>> os.environ['LAMBDAPROMPT_BACKEND'] = 'SantaCoderGGML'
>>> comp = lp.Completion("# Some code to print fibonacci numbers\n"*100, max_new_tokens=100)
>>> comp()
Fetching 0 files: 0it [00:00, ?it/s]
Fetching 1 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 25575.02it/s]
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 268617232, available 268435456)
Segmentation fault (core dumped)

(I did one other test with "# Some code to print fibonacci numbers\n"*60 and this one successfully ran on santacoder)

>>> len(lp.backends.backends['completion'].model.tokenize("# Some code to print fibonacci numbers\n"*60))
720
>>> len(lp.backends.backends['completion'].model.tokenize("# Some code to print fibonacci numbers\n"*100))
1200

I'll try out the starcoder.cpp and raw ggml with santacoder later / when I'm back at my machine.

bluecoconut · 2023-05-16T20:55:56Z

bigcode-project/starcoder.cpp#3

Seems someone else has run into this on the starcoder.cpp

ggerganov · 2023-05-20T13:53:56Z

I tried looking into this but the python script from the example fails to download the model on Mac OS:

 $ ▶ python3 examples/starcoder/convert-hf-to-ggml.py bigcode/gpt_bigcode-santacoder
Loading model:  bigcode/gpt_bigcode-santacoder
Traceback (most recent call last):
  File "/Users/ggerganov/development/github/ggml/examples/starcoder/convert-hf-to-ggml.py", line 56, in <module>
    config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 766, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 473, in __getitem__
    raise KeyError(key)
KeyError: 'gpt_bigcode'

Any ideas how to fix this?

NouamaneTazi · 2023-05-20T14:30:31Z

@ggerganov I think you're on an old version of transformers
Try updating it: pip install -U transformers

NouamaneTazi · 2023-05-20T15:18:56Z

@ggerganov I've been trying to increase context's memory pool by modifying this part of the code

        ctx_size += 10 * 1024 * 1024; // TODO: tune this

        printf("%s: ggml ctx size = %6.2f MB\n", __func__, ctx_size/(1024.0*1024.0));

but it doesnt seem to affect ctx->mem_size
because the error message is always the same: ggml_new_tensor_impl: not enough space in the context's memory pool (needed 268637760, available 268435456) ( ctx->mem_size = 268435456 where it should be more)

Any idea how to increase ctx->mem_size?
Relevant PR

ggerganov · 2023-05-20T15:42:07Z

The problem is in the "eval" context:

ggml/examples/starcoder/main.cpp

Lines 415 to 431 in c2fab8a

 
 static size_t buf_size = 256u*1024*1024; 

 static void * buf = malloc(buf_size); 

 if (mem_per_token > 0 && mem_per_token*N > buf_size) { 

 const size_t buf_size_new = 1.1*(mem_per_token*N); // add 10% to account for ggml object overhead 

 //printf("\n%s: reallocating buffer from %zu to %zu bytes\n", __func__, buf_size, buf_size_new); 

 // reallocate 

 buf_size = buf_size_new; 

 buf = realloc(buf, buf_size); 

 if (buf == nullptr) { 

 fprintf(stderr, "%s: failed to allocate %zu bytes\n", __func__, buf_size); 

 return false; 

 } 

 }

Currently, it starts with a 256 MB buffer and is increased based on N.
But this does not take into account n_past and in general is a very memory wasteful approach since the entire compute graph results are stored in this buffer.

Here I tried to improve this using scratch buffers: #176

Please give it a try and let me know if your tests still crash using this version

vmajor · 2023-06-10T04:05:14Z

I am observing a similar issue with the python wrapper llama-cpp-llama:
abetlen/llama-cpp-python#356

eshaanagarwal · 2023-06-13T08:01:54Z

Hi I was trying GPT4all 1.3 groovy model and i faced the same issue. i am not able to understand why this is happening, Can anybody provide me with some solution for it.

vmajor · 2023-06-13T08:12:57Z

@eshaanagarwal the only "solution" that I found was a reboot. Since rebooting is not an option I had to switch to different models. For me all 30B/33B LLM models eventually develop this error when the input context is reaching the upper limit. This does not affect the 65B models. I do not know about any other relationships as this is my use case.

eshaanagarwal · 2023-06-13T08:37:20Z

@eshaanagarwal the only "solution" that I found was a reboot. Since rebooting is not an option I had to switch to different models. For me all 30B/33B LLM models eventually develop this error when the input context is reaching the upper limit. This does not affect the 65B models. I do not know about any other relationships as this is my use case.

@ggerganov can the memory leak or the issue be fixed ? Or any possible direction as to how to fix it ? Because I really need for this model to work

ggerganov · 2023-06-18T09:55:32Z

@eshaanagarwal If you are using the latest version of the starcoder example the issue should not occur. It was fixed in #176

If the issue occur, please provide more details about the model that you are using, your system information and the parameters with which you trigger the error

bluecoconut mentioned this issue May 16, 2023

Local Mode fails on GGML models approximatelabs/sketch#22

Open

This was referenced May 20, 2023

mpt: ggml_new_tensor_impl: not enough space in the context's memory pool #171

Closed

mpt - Fix mem_per_token not incrementing #173

Closed

mpt - Add flags for prompt context size (-c/--ctx_size) #174

Closed

NouamaneTazi mentioned this issue May 20, 2023

[Starcoder] fix not enough space in the context's memory pool #175

Closed

ggerganov closed this as completed Jun 18, 2023

yunghoy mentioned this issue Jul 19, 2023

converter does not work with the current ggml skeskinen/bert.cpp#23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

starcoder -- not enough space in the context's memory pool #158

starcoder -- not enough space in the context's memory pool #158

bluecoconut commented May 16, 2023

NouamaneTazi commented May 16, 2023 •

edited

Loading

bluecoconut commented May 16, 2023

bluecoconut commented May 16, 2023

ggerganov commented May 20, 2023

NouamaneTazi commented May 20, 2023

NouamaneTazi commented May 20, 2023 •

edited

Loading

ggerganov commented May 20, 2023

vmajor commented Jun 10, 2023

eshaanagarwal commented Jun 13, 2023

vmajor commented Jun 13, 2023

eshaanagarwal commented Jun 13, 2023

ggerganov commented Jun 18, 2023

starcoder -- not enough space in the context's memory pool #158

starcoder -- not enough space in the context's memory pool #158

Comments

bluecoconut commented May 16, 2023

NouamaneTazi commented May 16, 2023 • edited Loading

bluecoconut commented May 16, 2023

bluecoconut commented May 16, 2023

ggerganov commented May 20, 2023

NouamaneTazi commented May 20, 2023

NouamaneTazi commented May 20, 2023 • edited Loading

ggerganov commented May 20, 2023

vmajor commented Jun 10, 2023

eshaanagarwal commented Jun 13, 2023

vmajor commented Jun 13, 2023

eshaanagarwal commented Jun 13, 2023

ggerganov commented Jun 18, 2023

NouamaneTazi commented May 16, 2023 •

edited

Loading

NouamaneTazi commented May 20, 2023 •

edited

Loading