Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for RedPajama #134

Closed
wants to merge 6 commits into from
Closed

Support for RedPajama #134

wants to merge 6 commits into from

Conversation

amirza1
Copy link

@amirza1 amirza1 commented May 6, 2023

This supports RedPajama, which is gptneo-x with use_parallel_residual=False.

@Green-Sky
Copy link
Contributor

just tested, it works. But I feel like adding more and more small variations of the same code is kinda bad. we should merge stableml and redpajama (and dolly?) and call it gptneox :)

@amirza1
Copy link
Author

amirza1 commented May 7, 2023 via email

@ggerganov
Copy link
Owner

Yup, will combine them soon. Just need some time to test the Dolly model and make sure the inference is correct - currently can't convert the model on Mac OS since no bfloat16 python support.

@keldenl
Copy link

keldenl commented May 8, 2023

@amirza1 awesome stuff! uploading ggml models on hugging face https://huggingface.co/huggersbro/RedPajama-INCITE-Chat-3B-v1-GGML (should be good to link since it's open source right?)

@mverrilli
Copy link
Contributor

mverrilli commented May 8, 2023

Hi @ggerganov

currently can't convert the model on Mac OS since no bfloat16 python support

I liked @keldenl's idea so I posted the ggml bins here if that helps you test:
https://huggingface.co/mverrilli/dolly-v2-3b-ggml/tree/main
https://huggingface.co/mverrilli/dolly-v2-12b-ggml/tree/main

Let me know if I can assist.

@keldenl
Copy link

keldenl commented May 8, 2023

@amirza1 awesome stuff! uploading ggml models on hugging face https://huggingface.co/huggersbro/RedPajama-INCITE-Chat-3B-v1-GGML (should be good to link since it's open source right?)

Here's the 3B instruct model: https://huggingface.co/keldenl/RedPajama-INCITE-Instruct-3B-v1-GGML/

@amirza1 @ggerganov should we link these ggml models in the readme like gpt-2 (since this has apache 2 license) as an alternative option (i.e. "or you can get the ggml directly")

Update: Here's the 7B instruct model https://huggingface.co/keldenl/RedPajama-INCITE-Instruct-7B-v0.1-GGML

only chat 7b left, i'll upload it later tonight

@mudler
Copy link
Contributor

mudler commented May 9, 2023

Hi @ggerganov

currently can't convert the model on Mac OS since no bfloat16 python support

I liked @keldenl's idea so I posted the ggml bins here if that helps you test: https://huggingface.co/mverrilli/dolly-v2-3b-ggml/tree/main https://huggingface.co/mverrilli/dolly-v2-12b-ggml/tree/main

Let me know if I can assist.

I did give a shot and tried Q5 locally, no luck so far

main: seed = 1683660204
dollyv2_model_load: loading model from '/models/ggml-dolly-q5_0.bin' - please wait ...
dollyv2_model_load: n_vocab = 50280
dollyv2_model_load: n_ctx   = 2048
dollyv2_model_load: n_embd  = 4096
dollyv2_model_load: n_head  = 32
dollyv2_model_load: n_layer = 32
dollyv2_model_load: n_rot   = 32
dollyv2_model_load: ftype   = 8
dollyv2_model_load: ggml ctx size = 8596.22 MB
dollyv2_model_load: memory_size =  1024.00 MB, n_mem = 65536
dollyv2_model_load: unknown tensor 'gpt_neox.embed_in.weight' in model file
main: failed to load model from '/models/ggml-dolly-q5_0.bin'

@mverrilli
Copy link
Contributor

@mudler This works fine for me.

This is the mverrilli/dolly-v2-12b-ggml/ggml-model-q5_0.bin model, correct?

Maybe check your hash? SHA256: 79280421cc792330eaa56621060b8e2fb48ef570ace4572a91a1cf0e18ce7f38
I verified mine matches what's on HF.

There isn't a lot of error handling on the examples. Do you have enough ram to load the model?

@mudler
Copy link
Contributor

mudler commented May 10, 2023

@mudler This works fine for me.

This is the mverrilli/dolly-v2-12b-ggml/ggml-model-q5_0.bin model, correct?

Maybe check your hash? SHA256: 79280421cc792330eaa56621060b8e2fb48ef570ace4572a91a1cf0e18ce7f38 I verified mine matches what's on HF.

There isn't a lot of error handling on the examples. Do you have enough ram to load the model?

I've downloaded the 7b model (https://huggingface.co/mverrilli/dolly-v2-7b-ggml/blob/main/ggml-model-q5_0.bin):

~/_git/LocalAI
base ❯ sha256sum models/ggml-dolly-q5_0.bin            
9926cddcccd5c4d61a43ec05c8999147ae3c1deac7af636d3ffc618d7d30514b  models/ggml-dolly-q5_0.bin

Note I have 64GB of RAM, so that shouldn't be the issue

@mverrilli
Copy link
Contributor

@mudler Hash matches mine and I repulled master and rebuilt and it is working. I don't want to clutter up this PR any further, if you want to create a new issue I can work through it with you.

@ggerganov
Copy link
Owner

I've been a bit busy these days - will start looking soon into the newly proposed models here.

Please check if #139 works with RedPajama, and if so - I think we should merge it instead of adding a new example in order to reduce code duplication

@ggerganov
Copy link
Owner

I've decided to merge #139
I haven't tested RedPajama yet, so if anyone can give it try using latest master and report if it is working correctly.
Will close this PR and feel free to report if there are any issues with the new gpt-neox example

@ggerganov ggerganov closed this May 13, 2023
@Green-Sky
Copy link
Contributor

reconverted and works 👍

$ bin/gpt-neox -m ../examples/gpt-neox/models/RedPajama-INCITE-Base-3B-v1/ggml-model-f16.bin
main: seed = 1683973640
gpt_neox_model_load: loading model from '../examples/gpt-neox/models/RedPajama-INCITE-Base-3B-v1/ggml-model-f16.bin' - please wait ...
gpt_neox_model_load: n_vocab = 50432
gpt_neox_model_load: n_ctx   = 2048
gpt_neox_model_load: n_embd  = 2560
gpt_neox_model_load: n_head  = 32
gpt_neox_model_load: n_layer = 32
gpt_neox_model_load: n_rot   = 80
gpt_neox_model_load: par_res = 0
gpt_neox_model_load: ftype   = 1
gpt_neox_model_load: ggml ctx size = 7376.40 MB
gpt_neox_model_load: memory_size =   640.00 MB, n_mem = 65536
gpt_neox_model_load: ................................................ done
gpt_neox_model_load: model size =  5296.58 MB / num tensors = 388
main: number of tokens in prompt = 1
main: token[0] =   4553, After

After all, the first thing he did when he got to the table was to ask me for a light, and then he asked for a beer. It's the least I could do."

"I see," she said, and she did. He was a good-looking man, a little taller than she was, and a little broader too. He had dark, wavy hair and blue eyes and an open, easy manner. "Well," she said, "I have to go to the bathroom, and it's getting late. I'll see you tomorrow night."

"Thanks," he said. "I look forward to it. I'll send the bill to you."

"I'll put it on my account," she said. "The bill will be coming from me."

"Oh," he said, and smiled. "You're very careful with your money."

She had to smile too. "You're probably right," she said.


main: mem per token = 16137296 bytes
main:     load time =  1440.80 ms
main:   sample time =    20.09 ms
main:  predict time = 26467.06 ms / 132.34 ms per token
main:    total time = 28234.08 ms

@NancyAurum
Copy link

That commit has a bug. It calls
const int64_t t_main_start_us = ggml_time_us();
But doesn't call ggml_time_init()
So timer_freq is uninitialized and there is div by 0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

9 participants