Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

StableLM example #96

Merged
merged 7 commits into from
Apr 20, 2023
Merged

StableLM example #96

merged 7 commits into from
Apr 20, 2023

Conversation

ggerganov
Copy link
Owner

@ggerganov ggerganov commented Apr 19, 2023

Usage: https://github.com/ggerganov/ggml/tree/stablelm/examples/stablelm

TODO:

  • try to avoid the complex massaging of the QKV tensors during conversion
  • double-check the RoPE computation
  • add instructions + download scripts
  • test 7B model
  • fix the ggml_forward_dup_xxx() bug
  • support non-parallel residual
  • implement tokenizer

@slaren
Copy link
Collaborator

slaren commented Apr 19, 2023

I do not see an issue in that specific path, but this looks wrong, i11 is increased twice:

ggml/src/ggml.c

Lines 5774 to 5775 in 2b07ae8

i11++;
if (++i11 == ne1) {

@LostRuins
Copy link
Contributor

This looks exciting and I can't wait.

Btw don't know if you've already seen this, but there was a previous attempt at a NeoX implementation in ggml, thought I'd link it for reference just in case:

https://github.com/NolanoOrg/cformers/blob/master/cformers/cpp/ggml.c#L7292

@ggerganov
Copy link
Owner Author

ggerganov commented Apr 20, 2023

Yes I know. I took the QKV unpacking from their repo, but I think there has to be a better way to do it

@ggerganov
Copy link
Owner Author

@slaren Yup - that was the issue

@ggerganov ggerganov merged commit b3799aa into master Apr 20, 2023
@ggerganov ggerganov deleted the stablelm branch April 20, 2023 20:20
rabidcopy referenced this pull request in LostRuins/koboldcpp Apr 24, 2023
…d updated quantizers and quantization handling for gpt neox gpt 2 and gptj
@rabidcopy
Copy link

For what it's worth, StableLM 3B works for me, but StableLM 7B doesn't.

stablelm_model_load: tensor 'gpt_neox.embed_in.weight' has wrong size in model file
main: failed to load model from '/home/rabid/Desktop/ggml-model-stablelm-tuned-alpha-7b-q4_0.bin'

@ggerganov
Copy link
Owner Author

Seems to be working on my side - both F16 and quantized model. Can you double-check using latest master?

make -j && ./bin/stablelm -m models/stablelm-base-alpha-7b/ggml-model-q4_0.bin -p "I believe the meaning of life is" -t 8 -n 64
[  4%] Built target common
[  8%] Built target ggml
[ 12%] Built target test-mul-mat2
[ 16%] Built target test-mul-mat0
[ 20%] Built target test-vec0
[ 24%] Built target test3
[ 28%] Built target test0
[ 32%] Built target test-grad0
[ 38%] Built target test2
[ 40%] Built target test-vec2
[ 48%] Built target test-blas0
[ 48%] Built target gpt-2
[ 54%] Built target test-mul-mat1
[ 56%] Built target test1
[ 60%] Built target gpt-j
[ 64%] Built target common-ggml
[ 68%] Built target test-svd0
[ 72%] Built target stablelm
[ 76%] Built target whisper-cpp
[ 88%] Built target stablelm-quantize
[ 88%] Built target gpt-2-quantize
[ 88%] Built target mnist
[ 92%] Built target gpt-j-quantize
[ 96%] Built target whisper-quantize
[100%] Built target whisper
main: seed = 1682359069
stablelm_model_load: loading model from 'models/stablelm-base-alpha-7b/ggml-model-q4_0.bin' - please wait ...
stablelm_model_load: n_vocab = 50432
stablelm_model_load: n_ctx   = 4096
stablelm_model_load: n_embd  = 6144
stablelm_model_load: n_head  = 48
stablelm_model_load: n_layer = 16
stablelm_model_load: n_rot   = 32
stablelm_model_load: ftype   = 2
stablelm_model_load: ggml ctx size = 10069.99 MB
stablelm_model_load: memory_size =  1536.00 MB, n_mem = 65536
stablelm_model_load: ........................ done
stablelm_model_load: model size =  4694.30 MB / num tensors = 196
main: number of tokens in prompt = 7
main: token[0] =     42, I
main: token[1] =   2868,  believe
main: token[2] =    253,  the
main: token[3] =   4495,  meaning
main: token[4] =    273,  of
main: token[5] =   1495,  life
main: token[6] =    310,  is

I believe the meaning of life is to reproduce, and reproduce only. It is a very tiny fraction of what it is to be a living entity. 

If you're in a far away land and you're not allowed to travel far away in time, or on the other hand you're an ant in an ant farm and you're not allowed to

main: mem per token = 19317064 bytes
main:     load time =  1469.07 ms
main:   sample time =     9.64 ms
main:  predict time =  3700.70 ms / 52.87 ms per token
main:    total time =  5408.43 ms

@rabidcopy
Copy link

rabidcopy commented Apr 24, 2023

Still not loading on latest master for me..

stablelm_model_load: tensor 'gpt_neox.embed_in.weight' has wrong size in model file
main: failed to load model from '/home/rabid/Desktop/ggml-model-stablelm-tuned-alpha-7b-q4_0.bin'

Taken from https://huggingface.co/cakewalk/ggml-q4_0-stablelm-tuned-alpha-7b/blob/main/ggml-model-stablelm-tuned-alpha-7b-q4_0.bin. (I can't verify which conversion/quantization this used, I don't have enough RAM to convert 7B myself.)
This Pythia(Pythia is also GPT-NeoX) ggml conversion doesn't appear to load either. https://huggingface.co/Merry/ggml-pythia-deduped/blob/main/2023-04-20/ggml-pythia-1b-deduped-q4_3.bin

Converted using ggerganov/ggml's "stablelm" conversion script and quantization code as of commit 05f3079 (2023-04-20).

stablelm_model_load: tensor 'gpt_neox.embed_in.weight' has wrong size in model file: got 77266944, expected 4532994048
main: failed to load model from '/home/rabid/Desktop/ggml-pythia-1b-deduped-q4_3.bin'

Edit: Just now realized there was changes to the quantize code here two days ago, I'm guessing older quantizations won't work.
Edit2: That was it. Reconverted/quantized Pythia 1B and it works now. Sorry for the false alarm.

@sroussey
Copy link

How to use stableml to get embeddings back?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants