starcoder : example for using scratch buffers to reduce memory usage #176

ggerganov · 2023-05-20T15:41:19Z

Not ideal solution, but probably a good starting point.
Needs some testing

NouamaneTazi

Tested santacoder with a prompt of length 2003 and tried to generate 45 tokens and it worked 🎉

Thank you for taking care of this

ggerganov · 2023-05-20T16:01:58Z

Cool!
The sizes might need to be adjusted for the big starcoder model.
Do you have it handy to test it and see if that is the case?

NouamaneTazi · 2023-05-20T16:59:03Z

Seems like it still doesnt work for starcoder :/

starcoder_model_load: loading model from '../models/bigcode/starcoder-ggml-q4_1.bin'
starcoder_model_load: n_vocab = 49152
starcoder_model_load: n_ctx   = 8192
starcoder_model_load: n_embd  = 6144
starcoder_model_load: n_head  = 48
starcoder_model_load: n_layer = 40
starcoder_model_load: ftype   = 2003
starcoder_model_load: qntvr   = 2
starcoder_model_load: ggml ctx size = 26724.47 MB
starcoder_model_load: memory size = 15360.00 MB, n_mem = 327680
starcoder_model_load: model size  = 11364.23 MB
starcoder_model_load: model size  = 11364.23 MB
main: prompt: '# a a a a a a a a a a a a a a a a a a a 
...
def fibonacci('
main: number of tokens in prompt = 8005
...
 a a a a a a a a a a a a a a a a a a a a a a a a a a a a a aggml_new_tensor_impl: not enough space in the scratch memory
Segmentation fault (core dumped)

ggerganov · 2023-05-20T17:07:59Z

Bumped the buffers to 256 MB. Still not sure if enough - need to figure out a better way to do this

NouamaneTazi · 2023-05-20T17:42:21Z

Starcoder works now with 8k context! 🎉

$ ./bin/starcoder -m ../models/bigcode/starcoder-ggml-q4_1.bin -p "$(cat ../prompt.txt)" --top_k 0 --top_p 0.95 --temp 0.2 -n 200 -t 48
main: seed = 1684603021
starcoder_model_load: loading model from '../models/bigcode/starcoder-ggml-q4_1.bin'
starcoder_model_load: n_vocab = 49152
starcoder_model_load: n_ctx   = 8192
starcoder_model_load: n_embd  = 6144
starcoder_model_load: n_head  = 48
starcoder_model_load: n_layer = 40
starcoder_model_load: ftype   = 2003
starcoder_model_load: qntvr   = 2
starcoder_model_load: ggml ctx size = 26724.47 MB
starcoder_model_load: memory size = 15360.00 MB, n_mem = 327680
starcoder_model_load: model size  = 11364.23 MB

main: mem per token =   438584 bytes
main:     load time =  3832.32 ms
main:   sample time =     6.71 ms
main:  predict time = 1304651.88 ms / 161.87 ms per token
main:    total time = 1308736.00 ms

ggerganov · 2023-05-20T17:57:36Z

Btw, I saw you are using -t 48 - is that actually faster on your system?
I usually don't go over -t 8 even if the machine has many cores

NouamaneTazi · 2023-05-20T19:39:47Z

I didn't compare 8 to 48 threads, but 48 was definitely faster than the default 4

ExternIden · 2023-05-20T19:57:29Z

Have fix for these in GGML, put in PRs but they are stuck in github flagged filters

segfault issue is mem_per_token not incrementing;
mpt main.cpp:

ctx flags added in separate PR. They work it seems with high token count, tested w/ 65k. Benefits from n_predict set to same as ctx size as the -1 value isn't supported seemingly in ggml.

eshaanagarwal · 2023-06-13T08:39:15Z

Starcoder works now with 8k context! 🎉

$ ./bin/starcoder -m ../models/bigcode/starcoder-ggml-q4_1.bin -p "$(cat ../prompt.txt)" --top_k 0 --top_p 0.95 --temp 0.2 -n 200 -t 48
main: seed = 1684603021
starcoder_model_load: loading model from '../models/bigcode/starcoder-ggml-q4_1.bin'
starcoder_model_load: n_vocab = 49152
starcoder_model_load: n_ctx   = 8192
starcoder_model_load: n_embd  = 6144
starcoder_model_load: n_head  = 48
starcoder_model_load: n_layer = 40
starcoder_model_load: ftype   = 2003
starcoder_model_load: qntvr   = 2
starcoder_model_load: ggml ctx size = 26724.47 MB
starcoder_model_load: memory size = 15360.00 MB, n_mem = 327680
starcoder_model_load: model size  = 11364.23 MB

main: mem per token =   438584 bytes
main:     load time =  3832.32 ms
main:   sample time =     6.71 ms
main:  predict time = 1304651.88 ms / 161.87 ms per token
main:    total time = 1308736.00 ms

Hey I am facing the issue with Groovy 1.3 on GPT4all. I don't know why. Can you tell me about it ?

starcoder : example for using scratch buffers to reduce memory usage

3b63090

ggerganov mentioned this pull request May 20, 2023

starcoder -- not enough space in the context's memory pool #158

Closed

NouamaneTazi approved these changes May 20, 2023

View reviewed changes

NouamaneTazi mentioned this pull request May 20, 2023

[Starcoder] fix not enough space in the context's memory pool #175

Closed

NouamaneTazi mentioned this pull request May 20, 2023

Fix "not enough space in the context's memory pool" bigcode-project/starcoder.cpp#6

Merged

starcoder : bump scratch buffers to 256 MB

96b14fc

examples : add scratch buffers to MPT and GPT-NeoX

30b3f67

ggerganov merged commit d695755 into master May 20, 2023

ggerganov deleted the starcoder-scratch branch May 20, 2023 17:56

marella mentioned this pull request May 21, 2023

Starcoder / Quantized Issues marella/ctransformers#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

starcoder : example for using scratch buffers to reduce memory usage #176

starcoder : example for using scratch buffers to reduce memory usage #176

ggerganov commented May 20, 2023

NouamaneTazi left a comment

ggerganov commented May 20, 2023

NouamaneTazi commented May 20, 2023

ggerganov commented May 20, 2023

NouamaneTazi commented May 20, 2023

ggerganov commented May 20, 2023

NouamaneTazi commented May 20, 2023

ExternIden commented May 20, 2023

eshaanagarwal commented Jun 13, 2023

starcoder : example for using scratch buffers to reduce memory usage #176

starcoder : example for using scratch buffers to reduce memory usage #176

Conversation

ggerganov commented May 20, 2023

NouamaneTazi left a comment

Choose a reason for hiding this comment

ggerganov commented May 20, 2023

NouamaneTazi commented May 20, 2023

ggerganov commented May 20, 2023

NouamaneTazi commented May 20, 2023

ggerganov commented May 20, 2023

NouamaneTazi commented May 20, 2023

ExternIden commented May 20, 2023

eshaanagarwal commented Jun 13, 2023