Can't run replit-code-instruct-glaive.ggmlv1.q8_0 #453

Jipok · 2023-08-15T17:05:12Z

gcc -v

Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-unknown-linux-gnu/12.2.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: /builddir/gcc-12.2.0/configure --build=x86_64-unknown-linux-gnu --enable-gnu-unique-object --enable-vtable-verify --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --libexecdir=/usr/lib64 --libdir=/usr/lib64 --enable-threads=posix --enable-__cxa_atexit --disable-multilib --with-system-zlib --enable-shared --enable-lto --enable-plugins --enable-linker-build-id --disable-werror --disable-nls --enable-default-pie --enable-default-ssp --enable-checking=release --disable-libstdcxx-pch --with-isl --with-linker-hash-style=gnu --disable-sjlj-exceptions --disable-target-libiberty --disable-libssp --enable-languages=c,c++,objc,obj-c++,fortran,lto,go,ada
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.2.0 (GCC)

GGML Version: 95b559d

Model: TheBloke/Replit-Code-Instruct-Glaive-GGML

./bin/starcoder -t 8 -m ~/replit-code-instruct-glaive.ggmlv1.q8_0.bin --top_k 0 --top_p 0.95 --temp 0 -p ...
main: seed = 1692118804
starcoder_model_load: loading model from '/home/gpt-libre/replit-code-instruct-glaive.ggmlv1.q8_0.bin'
starcoder_model_load: n_vocab = 2560
starcoder_model_load: n_ctx   = 2048
starcoder_model_load: n_embd  = 32
starcoder_model_load: n_head  = 32
starcoder_model_load: n_layer = 32768
starcoder_model_load: ftype   = 2007
starcoder_model_load: qntvr   = 2
starcoder_model_load: invalid model file '/home/gpt-libre/replit-code-instruct-glaive.ggmlv1.q8_0.bin' (bad vocab size 7 != 2560)
main: failed to load model from '/home/gpt-libre/replit-code-instruct-glaive.ggmlv1.q8_0.bin'

The text was updated successfully, but these errors were encountered:

klosax · 2023-08-16T08:45:28Z

I think you should run the model using the replit inference example, not starcoder.

Jipok · 2023-08-16T20:01:09Z

I'm ashamed of such a stupid mistake.

@klosax Can you tell me why the model doesn't generate anything from --temp 0?

Jipok · 2023-08-16T20:05:16Z

The example seems to ignore <|endoftext|> from the model.

…ganov#453) * Support calling mlock() on loaded model data on Linux and macOS This is enabled by a new --mlock command line option. Using mlock() disables swapping and memory compression for the model data. Doing so can be useful on systems where the model takes up a large fraction of system RAM. In my experience, macOS is quite eager to start compressing llama.cpp's memory, which then makes it halt for a few seconds while it decompresses, even with a model that uses "only" 25GB out of 32GB. Of course, this comes at the cost of forcing the system to swap or compress other processes' memory instead, so it needs to be used with care and shouldn't be enabled by default. In theory it should be possible to support this on Windows as well using VirtualLock(), but I'm not much of a Windows user. * Update llama.cpp --------- Co-authored-by: Georgi Gerganov <[email protected]>

Jipok closed this as completed Aug 16, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can't run replit-code-instruct-glaive.ggmlv1.q8_0 #453

Can't run replit-code-instruct-glaive.ggmlv1.q8_0 #453

Jipok commented Aug 15, 2023 •

edited

Loading

klosax commented Aug 16, 2023

Jipok commented Aug 16, 2023

Jipok commented Aug 16, 2023

Can't run replit-code-instruct-glaive.ggmlv1.q8_0 #453

Can't run replit-code-instruct-glaive.ggmlv1.q8_0 #453

Comments

Jipok commented Aug 15, 2023 • edited Loading

klosax commented Aug 16, 2023

Jipok commented Aug 16, 2023

Jipok commented Aug 16, 2023

Jipok commented Aug 15, 2023 •

edited

Loading