Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

starcoder : example for using scratch buffers to reduce memory usage #176

Merged
merged 3 commits into from
May 20, 2023

Conversation

ggerganov
Copy link
Owner

Not ideal solution, but probably a good starting point.
Needs some testing

Copy link
Contributor

@NouamaneTazi NouamaneTazi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested santacoder with a prompt of length 2003 and tried to generate 45 tokens and it worked 🎉

Thank you for taking care of this

@ggerganov
Copy link
Owner Author

Cool!
The sizes might need to be adjusted for the big starcoder model.
Do you have it handy to test it and see if that is the case?

@NouamaneTazi
Copy link
Contributor

Seems like it still doesnt work for starcoder :/

starcoder_model_load: loading model from '../models/bigcode/starcoder-ggml-q4_1.bin'
starcoder_model_load: n_vocab = 49152
starcoder_model_load: n_ctx   = 8192
starcoder_model_load: n_embd  = 6144
starcoder_model_load: n_head  = 48
starcoder_model_load: n_layer = 40
starcoder_model_load: ftype   = 2003
starcoder_model_load: qntvr   = 2
starcoder_model_load: ggml ctx size = 26724.47 MB
starcoder_model_load: memory size = 15360.00 MB, n_mem = 327680
starcoder_model_load: model size  = 11364.23 MB
starcoder_model_load: model size  = 11364.23 MB
main: prompt: '# a a a a a a a a a a a a a a a a a a a 
...
def fibonacci('
main: number of tokens in prompt = 8005
...
 a a a a a a a a a a a a a a a a a a a a a a a a a a a a a aggml_new_tensor_impl: not enough space in the scratch memory
Segmentation fault (core dumped)

@ggerganov
Copy link
Owner Author

Bumped the buffers to 256 MB. Still not sure if enough - need to figure out a better way to do this

@NouamaneTazi
Copy link
Contributor

Starcoder works now with 8k context! 🎉

$ ./bin/starcoder -m ../models/bigcode/starcoder-ggml-q4_1.bin -p "$(cat ../prompt.txt)" --top_k 0 --top_p 0.95 --temp 0.2 -n 200 -t 48
main: seed = 1684603021
starcoder_model_load: loading model from '../models/bigcode/starcoder-ggml-q4_1.bin'
starcoder_model_load: n_vocab = 49152
starcoder_model_load: n_ctx   = 8192
starcoder_model_load: n_embd  = 6144
starcoder_model_load: n_head  = 48
starcoder_model_load: n_layer = 40
starcoder_model_load: ftype   = 2003
starcoder_model_load: qntvr   = 2
starcoder_model_load: ggml ctx size = 26724.47 MB
starcoder_model_load: memory size = 15360.00 MB, n_mem = 327680
starcoder_model_load: model size  = 11364.23 MB

main: mem per token =   438584 bytes
main:     load time =  3832.32 ms
main:   sample time =     6.71 ms
main:  predict time = 1304651.88 ms / 161.87 ms per token
main:    total time = 1308736.00 ms

@ggerganov ggerganov merged commit d695755 into master May 20, 2023
@ggerganov ggerganov deleted the starcoder-scratch branch May 20, 2023 17:56
@ggerganov
Copy link
Owner Author

Btw, I saw you are using -t 48 - is that actually faster on your system?
I usually don't go over -t 8 even if the machine has many cores

@NouamaneTazi
Copy link
Contributor

I didn't compare 8 to 48 threads, but 48 was definitely faster than the default 4

@ExternIden
Copy link

Have fix for these in GGML, put in PRs but they are stuck in github flagged filters

segfault issue is mem_per_token not incrementing;
mpt main.cpp:
image

ctx flags added in separate PR. They work it seems with high token count, tested w/ 65k. Benefits from n_predict set to same as ctx size as the -1 value isn't supported seemingly in ggml.

@eshaanagarwal
Copy link

Starcoder works now with 8k context! 🎉

$ ./bin/starcoder -m ../models/bigcode/starcoder-ggml-q4_1.bin -p "$(cat ../prompt.txt)" --top_k 0 --top_p 0.95 --temp 0.2 -n 200 -t 48
main: seed = 1684603021
starcoder_model_load: loading model from '../models/bigcode/starcoder-ggml-q4_1.bin'
starcoder_model_load: n_vocab = 49152
starcoder_model_load: n_ctx   = 8192
starcoder_model_load: n_embd  = 6144
starcoder_model_load: n_head  = 48
starcoder_model_load: n_layer = 40
starcoder_model_load: ftype   = 2003
starcoder_model_load: qntvr   = 2
starcoder_model_load: ggml ctx size = 26724.47 MB
starcoder_model_load: memory size = 15360.00 MB, n_mem = 327680
starcoder_model_load: model size  = 11364.23 MB

main: mem per token =   438584 bytes
main:     load time =  3832.32 ms
main:   sample time =     6.71 ms
main:  predict time = 1304651.88 ms / 161.87 ms per token
main:    total time = 1308736.00 ms

Hey I am facing the issue with Groovy 1.3 on GPT4all. I don't know why. Can you tell me about it ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants