Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

starcoder -- not enough space in the context's memory pool #158

Closed
bluecoconut opened this issue May 16, 2023 · 12 comments
Closed

starcoder -- not enough space in the context's memory pool #158

bluecoconut opened this issue May 16, 2023 · 12 comments

Comments

@bluecoconut
Copy link

I'm getting errors with starcoder models when I try to include any non-trivial amount of tokens. I'm getting this with both my raw model (direct .bin) and quantized model regardless of version (pre Q4/Q5 changes and post Q4/Q5 changes).

Relevant error:

ggml_new_tensor_impl: not enough space in the context's memory pool (needed 412241472, available 411790368)

Example:

./build/bin/starcoder -m /workspaces/research/models/starcoder/starcoder-ggml.bin -p "def fibo( fibo fib fibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test " --top_k 0 --top_p 0.95 --temp 0.2 

will cause the error

main: seed = 1684223471
starcoder_model_load: loading model from '/workspaces/research/models/starcoder/starcoder-ggml.bin'
starcoder_model_load: n_vocab = 49152
starcoder_model_load: n_ctx   = 8192
starcoder_model_load: n_embd  = 6144
starcoder_model_load: n_head  = 48
starcoder_model_load: n_layer = 40
starcoder_model_load: ftype   = 1
starcoder_model_load: qntvr   = 0
starcoder_model_load: ggml ctx size = 51276.47 MB
starcoder_model_load: memory size = 15360.00 MB, n_mem = 327680
starcoder_model_load: model size  = 35916.23 MB
main: prompt: 'def fibo( fibo fib fibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test '
main: number of tokens in prompt = 51, first 8 tokens: 589 28176 97 26 28176 97 28176 28176 

def fibo( fibo fib fibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibggml_new_tensor_impl: not enough space in the context's memory pool (needed 412241472, available 411952576)
Segmentation fault (core dumped)

(Here's another output from the quantized model)

vscode ➜ /workspaces/research/others/ggml (master) $ ./build/bin/starcoder -m /workspaces/research/models/starcoder/starcoder-ggml-q4_1.bin -p "def fibo( fibo fib fibo test wate
rfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test " --top_k 0 --top_p 0.95 --temp 0.2 
main: seed = 1684223600
starcoder_model_load: loading model from '/workspaces/research/models/starcoder/starcoder-ggml-q4_1.bin'
starcoder_model_load: n_vocab = 49152
starcoder_model_load: n_ctx   = 8192
starcoder_model_load: n_embd  = 6144
starcoder_model_load: n_head  = 48
starcoder_model_load: n_layer = 40
starcoder_model_load: ftype   = 1003
starcoder_model_load: qntvr   = 1
starcoder_model_load: ggml ctx size = 28956.47 MB
starcoder_model_load: memory size = 15360.00 MB, n_mem = 327680
starcoder_model_load: model size  = 13596.23 MB
main: prompt: 'def fibo( fibo fib fibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test '
main: number of tokens in prompt = 51, first 8 tokens: 589 28176 97 26 28176 97 28176 28176 

def fibo( fibo fib fibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibo test waterfibggml_new_tensor_impl: not enough space in the context's memory pool (needed 412241472, available 411790368)
Segmentation fault (core dumped)

Best I can find in the past was ggerganov/llama.cpp#29

But, maybe that was fixed in llama models, but the problem has returned for starcoder?

Based on: #146

Specifically hoping that @NouamaneTazi might have some clarity on why this might be happening?

@NouamaneTazi
Copy link
Contributor

NouamaneTazi commented May 16, 2023

Interesting find! Thank you for raising this. Two questions:

@bluecoconut
Copy link
Author

Just tried santacoder and it does seem to have the same problem, but at a very different scale. (Error is the same) (had to put in >700, maybe around 1000 tokens or so... so this might just be normal? context length issues?)

example code I used to test santacoder (note, this isn't directly on ggml executable, but through ctransformers, but, same errors show up as shown in the original post, where i directly just use the compiled ./starcoder, so i think it's safe to say that it'd behave the same on the underlying ggml)

Python 3.10.11 (main, Apr 12 2023, 14:46:22) [GCC 10.2.1 20210110] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import lambdaprompt as lp
>>> import os
>>> os.environ['LAMBDAPROMPT_BACKEND'] = 'SantaCoderGGML'
>>> comp = lp.Completion("# Some code to print fibonacci numbers\n"*100, max_new_tokens=100)
>>> comp()
Fetching 0 files: 0it [00:00, ?it/s]
Fetching 1 files: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 25575.02it/s]
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 268617232, available 268435456)
Segmentation fault (core dumped)

(I did one other test with "# Some code to print fibonacci numbers\n"*60 and this one successfully ran on santacoder)

>>> len(lp.backends.backends['completion'].model.tokenize("# Some code to print fibonacci numbers\n"*60))
720
>>> len(lp.backends.backends['completion'].model.tokenize("# Some code to print fibonacci numbers\n"*100))
1200

I'll try out the starcoder.cpp and raw ggml with santacoder later / when I'm back at my machine.

@bluecoconut
Copy link
Author

bigcode-project/starcoder.cpp#3

Seems someone else has run into this on the starcoder.cpp

@ggerganov
Copy link
Owner

I tried looking into this but the python script from the example fails to download the model on Mac OS:

 $ ▶ python3 examples/starcoder/convert-hf-to-ggml.py bigcode/gpt_bigcode-santacoder
Loading model:  bigcode/gpt_bigcode-santacoder
Traceback (most recent call last):
  File "/Users/ggerganov/development/github/ggml/examples/starcoder/convert-hf-to-ggml.py", line 56, in <module>
    config = AutoConfig.from_pretrained(model_name, trust_remote_code=True)
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 766, in from_pretrained
    config_class = CONFIG_MAPPING[config_dict["model_type"]]
  File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 473, in __getitem__
    raise KeyError(key)
KeyError: 'gpt_bigcode'

Any ideas how to fix this?

@NouamaneTazi
Copy link
Contributor

@ggerganov I think you're on an old version of transformers
Try updating it: pip install -U transformers

@NouamaneTazi
Copy link
Contributor

NouamaneTazi commented May 20, 2023

@ggerganov I've been trying to increase context's memory pool by modifying this part of the code

        ctx_size += 10 * 1024 * 1024; // TODO: tune this

        printf("%s: ggml ctx size = %6.2f MB\n", __func__, ctx_size/(1024.0*1024.0));

but it doesnt seem to affect ctx->mem_size
because the error message is always the same: ggml_new_tensor_impl: not enough space in the context's memory pool (needed 268637760, available 268435456) ( ctx->mem_size = 268435456 where it should be more)

Any idea how to increase ctx->mem_size?
Relevant PR

@ggerganov
Copy link
Owner

The problem is in the "eval" context:

static size_t buf_size = 256u*1024*1024;
static void * buf = malloc(buf_size);
if (mem_per_token > 0 && mem_per_token*N > buf_size) {
const size_t buf_size_new = 1.1*(mem_per_token*N); // add 10% to account for ggml object overhead
//printf("\n%s: reallocating buffer from %zu to %zu bytes\n", __func__, buf_size, buf_size_new);
// reallocate
buf_size = buf_size_new;
buf = realloc(buf, buf_size);
if (buf == nullptr) {
fprintf(stderr, "%s: failed to allocate %zu bytes\n", __func__, buf_size);
return false;
}
}

Currently, it starts with a 256 MB buffer and is increased based on N.
But this does not take into account n_past and in general is a very memory wasteful approach since the entire compute graph results are stored in this buffer.

Here I tried to improve this using scratch buffers: #176

Please give it a try and let me know if your tests still crash using this version

@vmajor
Copy link

vmajor commented Jun 10, 2023

I am observing a similar issue with the python wrapper llama-cpp-llama:
abetlen/llama-cpp-python#356

@eshaanagarwal
Copy link

Hi I was trying GPT4all 1.3 groovy model and i faced the same issue. i am not able to understand why this is happening, Can anybody provide me with some solution for it.

@vmajor
Copy link

vmajor commented Jun 13, 2023

@eshaanagarwal the only "solution" that I found was a reboot. Since rebooting is not an option I had to switch to different models. For me all 30B/33B LLM models eventually develop this error when the input context is reaching the upper limit. This does not affect the 65B models. I do not know about any other relationships as this is my use case.

@eshaanagarwal
Copy link

@eshaanagarwal the only "solution" that I found was a reboot. Since rebooting is not an option I had to switch to different models. For me all 30B/33B LLM models eventually develop this error when the input context is reaching the upper limit. This does not affect the 65B models. I do not know about any other relationships as this is my use case.

@ggerganov can the memory leak or the issue be fixed ? Or any possible direction as to how to fix it ? Because I really need for this model to work

@ggerganov
Copy link
Owner

@eshaanagarwal If you are using the latest version of the starcoder example the issue should not occur. It was fixed in #176

If the issue occur, please provide more details about the model that you are using, your system information and the parameters with which you trigger the error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants