Inference broken with starcoderplus-guanaco-gpt4.ggmlv1.q8_0.bin since 43ffec5 #378

tim-janik · 2023-07-12T21:32:21Z

A couple days ago, starcoder with starcoderplus-guanaco-gpt4 was perfectly capable of generating a C++ function that validates UTF-8 strings. That is not the case anymore, the inference gives answers that do not fit the prompt, most often it says that the question is unclear or it references the civil war, toxic words, etc. I've bisected this starting July 02:

965568d from 2023-07-02. Generates isUTF8() just fine, GOOD.
d5c4ce0 from 2023-07-04. Generates isUTF8() just fine. Last GOOD version.
bfc6d42 Aborts with GGML_ASSERT, skipping in bisect
d8fbf15 Aborts with GGML_ASSERT, skipping in bisect
43ffec5 from 2023-07-05. Fails, first BAD version. Generates e.g. "Your question is a bit unclear" or starts talking about some random website.

Here's how to reproduce it, prompt (with teacher forcing):

### Human: Write a function to check a C string for valid UTF-8 encoding without using external libs in C++.
### Assistant: Sure, here's the function:
```cpp

And command line:

build/bin/starcoder -t 12 -m models/starcoderplus-guanaco-gpt4.ggmlv1.q8_0.bin -n 4096 --top_p 0.3 --temp 1 \
	--top_k 9999 -f p-prompt.txt

Expected output is along the lines of:

#include <string.h>
bool isUTF8(const char* str) { /* checks for (*str < 0x80) etc */ }

The text was updated successfully, but these errors were encountered:

TheBloke · 2023-07-12T22:29:39Z

FYII I only uploaded the quants of Starcoderplus Guanaco 8 hours ago, and the unquantised model was only released 17 hours ago, so you couldn't have been testing it 2 days ago :)

Are you earlier comparisons with a different model, like WizardCoder Guanaco? https://huggingface.co/TheBloke/WizardCoder-Guanaco-15B-V1.0-GGML

tim-janik · 2023-07-12T22:46:50Z

FYII I only uploaded the quants of Starcoderplus Guanaco 8 hours ago, and the unquantised model was only released 17 hours ago, so you couldn't have been testing it 2 days ago :)

Are you earlier comparisons with a different model, like WizardCoder Guanaco? https://huggingface.co/TheBloke/WizardCoder-Guanaco-15B-V1.0-GGML

I know, I downloaded the model a few hours ago from your huggingface account, thanks for providing it.

With "couple days go", I'm referring to an old ggml build of 965568d that I used to test Starcoderplus-Guanaco-GPT4-15B-V1.0-GGML (which worked), before I pulled ggml from today, which is broken. Thus the bisect. Here are the attempts I made with Starcoderplus-Guanaco-GPT4-15B-V1.0-GGML in more detail:

git bisect start
# good: [965568dcd722462466afc1a729be55fb884ab64c] dolly : add interactive prompt and port mode (#319)
git bisect good 965568dcd722462466afc1a729be55fb884ab64c
# bad: [f6365c0605ac86c6ab106cda0e8d6650e54097a7] ggml : apply mul_mat broadcast fix (sync llama.cpp)
git bisect bad f6365c0605ac86c6ab106cda0e8d6650e54097a7
# bad: [3ea676a3a6b40d5b10d5e3cf73887838782aa830] ggml : sync llama.cpp (fix for #341)
git bisect bad 3ea676a3a6b40d5b10d5e3cf73887838782aa830
# skip: [bfc6d42f8c2141383e4f21e4a030688c71560da0] ggml : sync llama.cpp (generalize quantize_fns + CUDA improvements)
git bisect skip bfc6d42f8c2141383e4f21e4a030688c71560da0
# bad: [ad2754ef9fe64c80d8baa63b5a3ad362c8ea227a] pkg-config : fix typo in includedir (#367)
git bisect bad ad2754ef9fe64c80d8baa63b5a3ad362c8ea227a
# good: [b98cd8689f74ed69432323ef5a15369d96086ae1] whisper : fix wrong variable name from previous commit
git bisect good b98cd8689f74ed69432323ef5a15369d96086ae1
# bad: [43ffec5f7a927094a1148c800f8bf9ec3aadc198] ggml : fix bug introduced in bfc6d42f8c2141383e4f21e4a030688c71560da0
git bisect bad 43ffec5f7a927094a1148c800f8bf9ec3aadc198
# good: [d5c4ce0b45accdf828917b42fabfe7fc7d45364f] cmake : fix public header path for submodules (#342)
git bisect good d5c4ce0b45accdf828917b42fabfe7fc7d45364f
# skip: [d8fbf15c60a2e7136b9e3eea11a7ebb51ee8ab07] tests : sync from llama.cpp and disable some obsolete tests
git bisect skip d8fbf15c60a2e7136b9e3eea11a7ebb51ee8ab07
# only skipped commits left to test
# possible first bad commit: [43ffec5f7a927094a1148c800f8bf9ec3aadc198] ggml : fix bug introduced in bfc6d42f8c2141383e4f21e4a030688c71560da0
# possible first bad commit: [d8fbf15c60a2e7136b9e3eea11a7ebb51ee8ab07] tests : sync from llama.cpp and disable some obsolete tests
# possible first bad commit: [bfc6d42f8c2141383e4f21e4a030688c71560da0] ggml : sync llama.cpp (generalize quantize_fns + CUDA improvements)

ggerganov · 2023-07-14T08:25:39Z

@tim-janik

Please try the latest master and let me know if the issue persists. I think I just fixed a bug that could have caused this.
I'm downloading the model and will check on my end too

ggerganov · 2023-07-14T08:56:23Z

I just tested the Q8_0 model and it seems to work correctly after the fix:

$ ▶ make -j && ./bin/starcoder -t 8 -m models/starcoder/starcoderplus-guanaco-gpt4.ggmlv1.q8_0.bin -n 4096 --top_p 0.3 --temp 1 --top_k 9999 -f p-prompt.txt
[ 1%] Building C object src/CMakeFiles/ggml.dir/ggml.c.o
[ 4%] Built target common
[ 5%] Linking C static library libggml.a
[ 5%] Built target ggml
[ 7%] Linking C executable ../bin/test-grad0
[ 10%] Linking CXX executable ../bin/test-quantize-fns
[ 10%] Linking C executable ../bin/test-vec0
[ 14%] Linking C executable ../bin/test-opt
[ 14%] Linking C executable ../bin/test-mul-mat2
[ 15%] Linking C executable ../bin/test1
[ 17%] Linking C executable ../bin/test-vec1
[ 20%] Linking C executable ../bin/test3
[ 20%] Linking CXX executable ../bin/test-quantize-perf
[ 22%] Linking C executable ../bin/test-pool
[ 24%] Linking C executable ../bin/test2
[ 24%] Linking CXX executable ../../bin/mnist-cpu
[ 24%] Linking C executable ../bin/test-mul-mat0
[ 28%] Built target whisper-cpp
[ 28%] Linking C executable ../bin/test0
[ 32%] Linking CXX executable ../../bin/mnist
[ 32%] Built target common-ggml
[ 35%] Linking CXX executable ../../bin/whisper
[ 35%] Linking CXX executable ../../bin/gpt-j
[ 41%] Linking CXX executable ../../bin/dollyv2-quantize
[ 41%] Linking CXX executable ../../bin/gpt-2-quantize
[ 41%] Linking CXX executable ../../bin/gpt-2
[ 42%] Linking CXX executable ../../bin/mpt
[ 44%] Linking CXX executable ../../bin/replit-quantize
[ 45%] Linking CXX executable ../../bin/mpt-quantize
[ 51%] Linking CXX executable ../../bin/whisper-quantize
[ 51%] Linking CXX executable ../../bin/replit
[ 51%] Linking CXX executable ../../bin/gpt-j-quantize
[ 51%] Linking CXX executable ../../bin/gpt-neox-quantize
[ 52%] Linking CXX executable ../../bin/starcoder
[ 54%] Linking CXX executable ../../bin/starcoder-quantize
[ 54%] Linking CXX executable ../../bin/gpt-neox
[ 55%] Linking CXX executable ../../bin/dollyv2
[ 58%] Built target test-vec0
[ 58%] Built target test-vec1
[ 68%] Built target test-mul-mat2
[ 68%] Built target test2
[ 68%] Built target test-mul-mat0
[ 68%] Built target test0
[ 68%] Built target test-grad0
[ 68%] Built target test1
[ 68%] Built target test-pool
[ 71%] Built target test-opt
[ 71%] Built target test3
[ 74%] Built target test-quantize-fns
[ 74%] Built target mnist-cpu
[ 75%] Built target replit-quantize
[ 77%] Built target test-quantize-perf
[ 78%] Built target gpt-j-quantize
[ 80%] Built target gpt-neox-quantize
[ 81%] Built target dollyv2-quantize
[ 82%] Built target dollyv2
[ 84%] Built target gpt-2-quantize
[ 85%] Built target gpt-j
[ 87%] Built target gpt-2
[ 90%] Built target whisper
[ 90%] Built target mnist
[ 91%] Built target gpt-neox
[100%] Built target mpt-quantize
[100%] Built target replit
[100%] Built target starcoder-quantize
[100%] Built target starcoder
[100%] Built target mpt
[100%] Built target whisper-quantize
main: seed = 1689324709
starcoder_model_load: loading model from 'models/starcoder/starcoderplus-guanaco-gpt4.ggmlv1.q8_0.bin'
starcoder_model_load: n_vocab = 49152
starcoder_model_load: n_ctx = 8192
starcoder_model_load: n_embd = 6144
starcoder_model_load: n_head = 48
starcoder_model_load: n_layer = 40
starcoder_model_load: ftype = 2007
starcoder_model_load: qntvr = 2
starcoder_model_load: ggml ctx size = 34536.47 MB
starcoder_model_load: memory size = 15360.00 MB, n_mem = 327680
starcoder_model_load: model size = 19176.23 MB
extract_tests_from_file : No test file found.
test_gpt_tokenizer : 0 tests failed out of 0 tests.

main: temp = 1.000
main: top_k = 9999
main: top_p = 0.300
main: repeat_last_n = 64
main: repeat_penalty = 1.000
main: prompt: '### Human: Write a function to check a C string for valid UTF-8 encoding without using external libs in C++.

Assistant: Sure, here's the function:

main: number of tokens in prompt = 38
main: token[0] =   1482, ###
main: token[1] =  26929,  Human
main: token[2] =     44, :
main: token[3] =   5950,  Write
main: token[4] =    312,  a
main: token[5] =    667,  function
main: token[6] =    372,  to
main: token[7] =   1505,  check
main: token[8] =    312,  a
main: token[9] =    390,  C
main: token[10] =    802,  string
main: token[11] =    436,  for
main: token[12] =   2080,  valid
main: token[13] =   9354,  UTF
main: token[14] =     31, -
main: token[15] =     42, 8
main: token[16] =   7328,  encoding
main: token[17] =   2876,  without
main: token[18] =   1471,  using
main: token[19] =   6594,  external
main: token[20] =  25405,  libs
main: token[21] =    328,  in
main: token[22] =    390,  C
main: token[23] =  33749, ++.
main: token[24] =    203, 

main: token[25] =   1482, ###
main: token[26] =  34043,  Assistant
main: token[27] =     44, :
main: token[28] =  33169,  Sure
main: token[29] =     30, ,
main: token[30] =   2442,  here
main: token[31] =   1182, 's
main: token[32] =    322,  the
main: token[33] =    667,  function
main: token[34] =     44, :
main: token[35] =    203, 

main: token[36] =    914, ```
main: token[37] =   3766, cpp


### Human: Write a function to check a C string for valid UTF-8 encoding without using external libs in C++.
### Assistant: Sure, here's the function:
```cpp
#include <string.h>

bool isUTF8(const char* str) {
    while (*str) {
        if (*str < 0x80) {
            str++;
        } else if (*str < 0xC2) {
            return false;
        } else if (*str < 0xE0) {
            if (*(str + 1) < 0x80 || *(str + 1) > 0xBF) {
                return false;
            }
            str += 2;
        } else if (*str < 0xF0) {
            if (*(str + 1) < 0x80 || *(str + 1) > 0xBF ||
                *(str + 2) < 0x80 || *(str + 2) > 0xBF) {
                return false;
            }
            str += 3;
        } else if (*str < 0xF4) {
            if (*(str + 1) < 0x80 || *(str + 1) > 0xBF ||
                *(str + 2) < 0x80 || *(str + 2) > 0xBF ||
                *(str + 3) < 0x80 || *(str + 3) > 0xBF) {
                return false;
            }
            str += 4;
        } else {
            return false;
        }
    }
    return true;
}

This function is a state machine that checks each byte of a string for a valid UTF-8 encoding. It is not a complete implementation of the UTF-8 standard, but it will catch most of the common errors.

The function returns true if the string is valid UTF-8, and false if it is not.

It is important to note that this function is not foolproof. It will not catch all invalid UTF-8 encodings. For instance, it will not check for overlong encodings, or encodings that use the 0xFE and 0xFF bytes.```<|endoftext|>

main: mem per token = 462024 bytes
main: load time = 5986.67 ms
main: sample time = 1124.35 ms
main: predict time = 244805.97 ms / 506.84 ms per token
main: total time = 253612.77 ms

Please reopen if the issue persists

tim-janik · 2023-07-17T02:29:06Z

@tim-janik

Please try the latest master and let me know if the issue persists. I think I just fixed a bug that could have caused this. I'm downloading the model and will check on my end too

Confirmed, current master 9e3c293 works as expected.

ggerganov added the bug Something isn't working label Jul 14, 2023

ggerganov closed this as completed Jul 14, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference broken with starcoderplus-guanaco-gpt4.ggmlv1.q8_0.bin since 43ffec5 #378

Inference broken with starcoderplus-guanaco-gpt4.ggmlv1.q8_0.bin since 43ffec5 #378

tim-janik commented Jul 12, 2023

TheBloke commented Jul 12, 2023

tim-janik commented Jul 12, 2023

ggerganov commented Jul 14, 2023

ggerganov commented Jul 14, 2023 •

edited

Loading

Assistant: Sure, here's the function:

tim-janik commented Jul 17, 2023

Inference broken with starcoderplus-guanaco-gpt4.ggmlv1.q8_0.bin since 43ffec5 #378

Inference broken with starcoderplus-guanaco-gpt4.ggmlv1.q8_0.bin since 43ffec5 #378

Comments

tim-janik commented Jul 12, 2023

TheBloke commented Jul 12, 2023

tim-janik commented Jul 12, 2023

ggerganov commented Jul 14, 2023

ggerganov commented Jul 14, 2023 • edited Loading

Assistant: Sure, here's the function:

tim-janik commented Jul 17, 2023

ggerganov commented Jul 14, 2023 •

edited

Loading