Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't run replit-code-instruct-glaive.ggmlv1.q8_0 #453

Closed
Jipok opened this issue Aug 15, 2023 · 3 comments
Closed

Can't run replit-code-instruct-glaive.ggmlv1.q8_0 #453

Jipok opened this issue Aug 15, 2023 · 3 comments

Comments

@Jipok
Copy link

Jipok commented Aug 15, 2023

gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib64/gcc/x86_64-unknown-linux-gnu/12.2.0/lto-wrapper
Target: x86_64-unknown-linux-gnu
Configured with: /builddir/gcc-12.2.0/configure --build=x86_64-unknown-linux-gnu --enable-gnu-unique-object --enable-vtable-verify --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --libexecdir=/usr/lib64 --libdir=/usr/lib64 --enable-threads=posix --enable-__cxa_atexit --disable-multilib --with-system-zlib --enable-shared --enable-lto --enable-plugins --enable-linker-build-id --disable-werror --disable-nls --enable-default-pie --enable-default-ssp --enable-checking=release --disable-libstdcxx-pch --with-isl --with-linker-hash-style=gnu --disable-sjlj-exceptions --disable-target-libiberty --disable-libssp --enable-languages=c,c++,objc,obj-c++,fortran,lto,go,ada
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 12.2.0 (GCC)
GGML Version: 95b559d

Model: TheBloke/Replit-Code-Instruct-Glaive-GGML

./bin/starcoder -t 8 -m ~/replit-code-instruct-glaive.ggmlv1.q8_0.bin --top_k 0 --top_p 0.95 --temp 0 -p ...
main: seed = 1692118804
starcoder_model_load: loading model from '/home/gpt-libre/replit-code-instruct-glaive.ggmlv1.q8_0.bin'
starcoder_model_load: n_vocab = 2560
starcoder_model_load: n_ctx   = 2048
starcoder_model_load: n_embd  = 32
starcoder_model_load: n_head  = 32
starcoder_model_load: n_layer = 32768
starcoder_model_load: ftype   = 2007
starcoder_model_load: qntvr   = 2
starcoder_model_load: invalid model file '/home/gpt-libre/replit-code-instruct-glaive.ggmlv1.q8_0.bin' (bad vocab size 7 != 2560)
main: failed to load model from '/home/gpt-libre/replit-code-instruct-glaive.ggmlv1.q8_0.bin'
@klosax
Copy link
Contributor

klosax commented Aug 16, 2023

I think you should run the model using the replit inference example, not starcoder.

@Jipok
Copy link
Author

Jipok commented Aug 16, 2023

I'm ashamed of such a stupid mistake.

@klosax Can you tell me why the model doesn't generate anything from --temp 0?

@Jipok Jipok closed this as completed Aug 16, 2023
@Jipok
Copy link
Author

Jipok commented Aug 16, 2023

The example seems to ignore <|endoftext|> from the model.

CCLDArjun pushed a commit to CCLDArjun/ggml that referenced this issue Dec 18, 2023
…ganov#453)

* Support calling mlock() on loaded model data on Linux and macOS

This is enabled by a new --mlock command line option.

Using mlock() disables swapping and memory compression for the model
data.  Doing so can be useful on systems where the model takes up a
large fraction of system RAM.  In my experience, macOS is quite eager to
start compressing llama.cpp's memory, which then makes it halt for a few
seconds while it decompresses, even with a model that uses "only" 25GB
out of 32GB.

Of course, this comes at the cost of forcing the system to swap or
compress other processes' memory instead, so it needs to be used with
care and shouldn't be enabled by default.

In theory it should be possible to support this on Windows as well using
VirtualLock(), but I'm not much of a Windows user.

* Update llama.cpp

---------

Co-authored-by: Georgi Gerganov <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants