-
Notifications
You must be signed in to change notification settings - Fork 966
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Can't run replit-code-instruct-glaive.ggmlv1.q8_0 #453
Comments
I think you should run the model using the replit inference example, not starcoder. |
I'm ashamed of such a stupid mistake. @klosax Can you tell me why the model doesn't generate anything from --temp 0? |
The example seems to ignore <|endoftext|> from the model. |
CCLDArjun
pushed a commit
to CCLDArjun/ggml
that referenced
this issue
Dec 18, 2023
…ganov#453) * Support calling mlock() on loaded model data on Linux and macOS This is enabled by a new --mlock command line option. Using mlock() disables swapping and memory compression for the model data. Doing so can be useful on systems where the model takes up a large fraction of system RAM. In my experience, macOS is quite eager to start compressing llama.cpp's memory, which then makes it halt for a few seconds while it decompresses, even with a model that uses "only" 25GB out of 32GB. Of course, this comes at the cost of forcing the system to swap or compress other processes' memory instead, so it needs to be used with care and shouldn't be enabled by default. In theory it should be possible to support this on Windows as well using VirtualLock(), but I'm not much of a Windows user. * Update llama.cpp --------- Co-authored-by: Georgi Gerganov <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
gcc -v
Model: TheBloke/Replit-Code-Instruct-Glaive-GGML
The text was updated successfully, but these errors were encountered: