-
Notifications
You must be signed in to change notification settings - Fork 9.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: The output of the lama-clI is not the same as the output of the lama-server #7973
Comments
I have been told that I need to use the specific prompt for instruct models, which I use in my config but it's still does not work with Llama 3 instruct, I am still waiting for reply, see here: #7929 (comment) |
I would like to know if you have tried to compare the results of llama-cli and llama-server? |
@ztrong-forever Llama server seems work fine if you select the right "prompt style", llama3 in this case. llama-cli if ran with small ctx like 512 once context window is filled stop outputting anything, server after context window is filled just print empty line or slashes or other strange things: Here is how I run server ./llama-server -m ../../models/meta-llama-3-8b-instruct_q5_k_s.gguf --gpu-layers 35 -c 512 >> then new UI select llama 3. Screencast.from.2024-06-18.04.17.48.PM.webmHere is one for cli: llama.cpp/llama-cli --model ../../models/meta-llama-3-8b-instruct_q5_k_s.gguf --n-gpu-layers 35 -cnv --interactive-first --simple-io --interactive -b 512 --ctx_size 512 --temp 0.3 --top_k 10 --multiline-input --repeat_penalty 1.12 -t 6 --chat-template llama3 |
Thanks! It works on my side as well! |
This issue was closed because it has been inactive for 14 days since being marked as stale. |
What happened?
run llama-cli:
./bin/llama-cli -m ./models/Meta-Llama-3-8B-Instruct.Q2_K.gguf -n 512 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt
run llama-server:
./bin/llama-server -m ./models/Meta-Llama-3-8B-Instruct.Q2_K.gguf -c 2048
Name and Version
llama-cli:
version: 3164 (df68d4f)
built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu
llama-server:
version: 3164 (df68d4f)
built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu
What operating system are you seeing the problem on?
Linux
Relevant log output
No response
The text was updated successfully, but these errors were encountered: