Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for bigcode/starcoder #188

Closed
seyyedaliayati opened this issue May 23, 2023 · 8 comments
Closed

Support for bigcode/starcoder #188

seyyedaliayati opened this issue May 23, 2023 · 8 comments

Comments

@seyyedaliayati
Copy link

Hi!
I saw the example for the bigcode/gpt_bigcode-santacoder model. I am wondering how I can run the bigcode/starcoder model on CPU with a similar approach.

When I run the following command:

python examples/starcoder/convert-hf-to-ggml.py bigcode/starcoder

I encountered this error:

OSError: Consistency check failed: file should be of size 9904379239 but has size 2282030570 ((…)l-00001-of-00007.bin).

Any ideas or help would be greatly appreciated.

@s-kostyaev
Copy link

%  git diff
diff --git a/examples/starcoder/main.cpp b/examples/starcoder/main.cpp
index c9d1d7e..1972732 100644
--- a/examples/starcoder/main.cpp
+++ b/examples/starcoder/main.cpp
@@ -18,11 +18,11 @@
 // https://huggingface.co/bigcode/gpt_bigcode-santacoder/blob/main/config.json
 struct starcoder_hparams {
     int32_t n_vocab = 49280;
-    int32_t n_ctx   = 2048;
-    int32_t n_embd  = 2048;
-    int32_t n_head  = 16;
-    int32_t n_layer = 24;
-    int32_t ftype   = 1;
+  int32_t n_ctx   = 8192;
+  int32_t n_embd  = 6144;
+  int32_t n_head  = 48;
+  int32_t n_layer = 40;
+  int32_t ftype   = 1;
 };
 
 struct starcoder_layer {
diff --git a/examples/starcoder/quantize.cpp b/examples/starcoder/quantize.cpp
index 101af50..09111bb 100644
--- a/examples/starcoder/quantize.cpp
+++ b/examples/starcoder/quantize.cpp
@@ -16,11 +16,11 @@
 // default hparams (GPT-2 117M)
 struct starcoder_hparams {
     int32_t n_vocab = 49280;
-    int32_t n_ctx   = 2048;
-    int32_t n_embd  = 2048;
-    int32_t n_head  = 16;
-    int32_t n_layer = 24;
-    int32_t ftype   = 1;
+  int32_t n_ctx   = 8192;
+  int32_t n_embd  = 6144;
+  int32_t n_head  = 48;
+  int32_t n_layer = 40;
+  int32_t ftype   = 1;
 };
 
 // quantize a model

Converted and quantized models can be found here:
https://huggingface.co/NeoDim/starcoder-GGML
https://huggingface.co/NeoDim/starcoderbase-GGML
https://huggingface.co/NeoDim/starchat-alpha-GGML

@ggerganov
Copy link
Owner

@s-kostyaev I don't think you need this patch - the correct parameters are loaded from the model file

@s-kostyaev
Copy link

@ggerganov Ok. Why it doesn't work for @seyyedaliayati ?

@s-kostyaev
Copy link

@ggerganov you are right. Without patch all works fine. Thank you for information.

@seyyedaliayati
Copy link
Author

he correct parameters are loaded from the model file

So, why I get this error? If you need more information, please let me know. Thanks.

@seyyedaliayati
Copy link
Author

I just re-run and now I get this:

(base) ali@host:~/ggml$ python examples/starcoder/convert-hf-to-ggml.py bigcode/starcoder
Loading model:  bigcode/starcoder
Downloading (…)l-00001-of-00007.bin: 100%|█████████████████████████████████████████| 9.90G/9.90G [05:09<00:00, 32.0MB/s]
Downloading (…)l-00002-of-00007.bin: 100%|█████████████████████████████████████████| 9.86G/9.86G [04:51<00:00, 33.8MB/s]
Downloading (…)l-00003-of-00007.bin: 100%|█████████████████████████████████████████| 9.85G/9.85G [04:59<00:00, 32.9MB/s]
Downloading (…)l-00004-of-00007.bin: 100%|█████████████████████████████████████████| 9.86G/9.86G [04:53<00:00, 33.6MB/s]
Downloading (…)l-00005-of-00007.bin: 100%|█████████████████████████████████████████| 9.85G/9.85G [04:55<00:00, 33.3MB/s]
Downloading (…)l-00006-of-00007.bin: 100%|█████████████████████████████████████████| 9.86G/9.86G [04:58<00:00, 33.1MB/s]
Downloading (…)l-00007-of-00007.bin: 100%|█████████████████████████████████████████| 4.08G/4.08G [01:56<00:00, 34.9MB/s]
Downloading shards: 100%|████████████████████████████████████████████████████████████████| 7/7 [31:46<00:00, 272.30s/it]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 7/7 [03:42<00:00, 31.77s/it]
Killed
(base) ali@host:~/ggml$ python examples/starcoder/convert-hf-to-ggml.py bigcode/starcoder
Loading model:  bigcode/starcoder
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████| 7/7 [03:38<00:00, 31.20s/it]
Killed
  • Is it because I am running on WSL?

@jaeminSon
Copy link
Contributor

I'm afraid you may be lacking system memory for the conversion.

@seyyedaliayati
Copy link
Author

I'm afraid you may be lacking system memory for the conversion.

You are right. I have increased my RAM and issue solved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants