StableLM example #96

ggerganov · 2023-04-19T21:48:56Z

Usage: https://github.com/ggerganov/ggml/tree/stablelm/examples/stablelm

TODO:

try to avoid the complex massaging of the QKV tensors during conversion
double-check the RoPE computation
add instructions + download scripts
test 7B model
fix the ggml_forward_dup_xxx() bug
support non-parallel residual
implement tokenizer

slaren · 2023-04-19T22:13:03Z

I do not see an issue in that specific path, but this looks wrong, i11 is increased twice:

Lines 5774 to 5775 in 2b07ae8

 i11++; 

 if (++i11 == ne1) {

LostRuins · 2023-04-20T06:11:10Z

This looks exciting and I can't wait.

Btw don't know if you've already seen this, but there was a previous attempt at a NeoX implementation in ggml, thought I'd link it for reference just in case:

https://github.com/NolanoOrg/cformers/blob/master/cformers/cpp/ggml.c#L7292

ggerganov · 2023-04-20T07:03:41Z

Yes I know. I took the QKV unpacking from their repo, but I think there has to be a better way to do it

ggerganov · 2023-04-20T19:00:07Z

@slaren Yup - that was the issue

Cannot see why, but multi-thread does not work

…d updated quantizers and quantization handling for gpt neox gpt 2 and gptj

rabidcopy · 2023-04-24T17:39:21Z

For what it's worth, StableLM 3B works for me, but StableLM 7B doesn't.

stablelm_model_load: tensor 'gpt_neox.embed_in.weight' has wrong size in model file
main: failed to load model from '/home/rabid/Desktop/ggml-model-stablelm-tuned-alpha-7b-q4_0.bin'

ggerganov · 2023-04-24T17:59:20Z

Seems to be working on my side - both F16 and quantized model. Can you double-check using latest master?

make -j && ./bin/stablelm -m models/stablelm-base-alpha-7b/ggml-model-q4_0.bin -p "I believe the meaning of life is" -t 8 -n 64
[  4%] Built target common
[  8%] Built target ggml
[ 12%] Built target test-mul-mat2
[ 16%] Built target test-mul-mat0
[ 20%] Built target test-vec0
[ 24%] Built target test3
[ 28%] Built target test0
[ 32%] Built target test-grad0
[ 38%] Built target test2
[ 40%] Built target test-vec2
[ 48%] Built target test-blas0
[ 48%] Built target gpt-2
[ 54%] Built target test-mul-mat1
[ 56%] Built target test1
[ 60%] Built target gpt-j
[ 64%] Built target common-ggml
[ 68%] Built target test-svd0
[ 72%] Built target stablelm
[ 76%] Built target whisper-cpp
[ 88%] Built target stablelm-quantize
[ 88%] Built target gpt-2-quantize
[ 88%] Built target mnist
[ 92%] Built target gpt-j-quantize
[ 96%] Built target whisper-quantize
[100%] Built target whisper
main: seed = 1682359069
stablelm_model_load: loading model from 'models/stablelm-base-alpha-7b/ggml-model-q4_0.bin' - please wait ...
stablelm_model_load: n_vocab = 50432
stablelm_model_load: n_ctx   = 4096
stablelm_model_load: n_embd  = 6144
stablelm_model_load: n_head  = 48
stablelm_model_load: n_layer = 16
stablelm_model_load: n_rot   = 32
stablelm_model_load: ftype   = 2
stablelm_model_load: ggml ctx size = 10069.99 MB
stablelm_model_load: memory_size =  1536.00 MB, n_mem = 65536
stablelm_model_load: ........................ done
stablelm_model_load: model size =  4694.30 MB / num tensors = 196
main: number of tokens in prompt = 7
main: token[0] =     42, I
main: token[1] =   2868,  believe
main: token[2] =    253,  the
main: token[3] =   4495,  meaning
main: token[4] =    273,  of
main: token[5] =   1495,  life
main: token[6] =    310,  is

I believe the meaning of life is to reproduce, and reproduce only. It is a very tiny fraction of what it is to be a living entity. 

If you're in a far away land and you're not allowed to travel far away in time, or on the other hand you're an ant in an ant farm and you're not allowed to

main: mem per token = 19317064 bytes
main:     load time =  1469.07 ms
main:   sample time =     9.64 ms
main:  predict time =  3700.70 ms / 52.87 ms per token
main:    total time =  5408.43 ms

rabidcopy · 2023-04-24T18:22:09Z

Still not loading on latest master for me..

stablelm_model_load: tensor 'gpt_neox.embed_in.weight' has wrong size in model file
main: failed to load model from '/home/rabid/Desktop/ggml-model-stablelm-tuned-alpha-7b-q4_0.bin'

Taken from https://huggingface.co/cakewalk/ggml-q4_0-stablelm-tuned-alpha-7b/blob/main/ggml-model-stablelm-tuned-alpha-7b-q4_0.bin. (I can't verify which conversion/quantization this used, I don't have enough RAM to convert 7B myself.)
This Pythia(Pythia is also GPT-NeoX) ggml conversion doesn't appear to load either. https://huggingface.co/Merry/ggml-pythia-deduped/blob/main/2023-04-20/ggml-pythia-1b-deduped-q4_3.bin

Converted using ggerganov/ggml's "stablelm" conversion script and quantization code as of commit 05f3079 (2023-04-20).

stablelm_model_load: tensor 'gpt_neox.embed_in.weight' has wrong size in model file: got 77266944, expected 4532994048
main: failed to load model from '/home/rabid/Desktop/ggml-pythia-1b-deduped-q4_3.bin'

Edit: Just now realized there was changes to the quantize code here two days ago, I'm guessing older quantizations won't work.
Edit2: That was it. Reconverted/quantized Pythia 1B and it works now. Sorry for the false alarm.

sroussey · 2023-04-25T15:30:44Z

How to use stableml to get embeddings back?

ggerganov mentioned this pull request Apr 19, 2023

Support StableLM From StabilityAI ggerganov/llama.cpp#1063

Closed

ggerganov added 3 commits April 20, 2023 22:01

ggml : there is a bug in ggml_cpy() F32 -> F32

5993d7c

Cannot see why, but multi-thread does not work

stablelm : initial implementation, but QKV seems broken

9b2f64b

stablelm : make it work

93b95a3

ggerganov force-pushed the stablelm branch from 2b07ae8 to 93b95a3 Compare April 20, 2023 19:03

ggerganov added 4 commits April 20, 2023 22:34

stablelm : use original merged QKV matrix

6d9224c

stablelm : minor

e294083

stablelm : instructions

3b1a580

stablelm : update README.md

6062fcd

ggerganov force-pushed the stablelm branch from ce58ebc to 6062fcd Compare April 20, 2023 20:20

ggerganov merged commit b3799aa into master Apr 20, 2023

ggerganov deleted the stablelm branch April 20, 2023 20:20

rabidcopy referenced this pull request in LostRuins/koboldcpp Apr 24, 2023

fixed compile errors, made mmap automatic when lora is selected, adde…

59fb174

…d updated quantizers and quantization handling for gpt neox gpt 2 and gptj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

StableLM example #96

StableLM example #96

ggerganov commented Apr 19, 2023 •

edited

Loading

slaren commented Apr 19, 2023 •

edited

Loading

LostRuins commented Apr 20, 2023

ggerganov commented Apr 20, 2023 •

edited

Loading

ggerganov commented Apr 20, 2023

rabidcopy commented Apr 24, 2023

ggerganov commented Apr 24, 2023

rabidcopy commented Apr 24, 2023 •

edited

Loading

sroussey commented Apr 25, 2023

StableLM example #96

StableLM example #96

Conversation

ggerganov commented Apr 19, 2023 • edited Loading

slaren commented Apr 19, 2023 • edited Loading

LostRuins commented Apr 20, 2023

ggerganov commented Apr 20, 2023 • edited Loading

ggerganov commented Apr 20, 2023

rabidcopy commented Apr 24, 2023

ggerganov commented Apr 24, 2023

rabidcopy commented Apr 24, 2023 • edited Loading

sroussey commented Apr 25, 2023

ggerganov commented Apr 19, 2023 •

edited

Loading

slaren commented Apr 19, 2023 •

edited

Loading

ggerganov commented Apr 20, 2023 •

edited

Loading

rabidcopy commented Apr 24, 2023 •

edited

Loading