Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: The output of the lama-clI is not the same as the output of the lama-server #7973

Closed
ztrong-forever opened this issue Jun 17, 2024 · 5 comments
Labels
bug-unconfirmed low severity Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches) stale

Comments

@ztrong-forever
Copy link

What happened?

run llama-cli:

./bin/llama-cli -m ./models/Meta-Llama-3-8B-Instruct.Q2_K.gguf -n 512 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt

  • output: Here I get the desired result
    image

run llama-server:

./bin/llama-server -m ./models/Meta-Llama-3-8B-Instruct.Q2_K.gguf -c 2048

  • nodejs code
    image
  • output: The results here are confusing. How do I make them consistent
    image

Name and Version

llama-cli:
version: 3164 (df68d4f)
built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu

llama-server:
version: 3164 (df68d4f)
built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu

What operating system are you seeing the problem on?

Linux

Relevant log output

No response

@ztrong-forever ztrong-forever added bug-unconfirmed low severity Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches) labels Jun 17, 2024
@dspasyuk
Copy link
Contributor

dspasyuk commented Jun 17, 2024

I have been told that I need to use the specific prompt for instruct models, which I use in my config but it's still does not work with Llama 3 instruct, I am still waiting for reply, see here: #7929 (comment)

@ztrong-forever
Copy link
Author

I have been told that I need to use the specific prompt for instruct models, which I use in my config but it's still does not work with Llama 3 instruct, I am still waiting for reply, see here: #7929 (comment)

I would like to know if you have tried to compare the results of llama-cli and llama-server?

@dspasyuk
Copy link
Contributor

dspasyuk commented Jun 18, 2024

@ztrong-forever Llama server seems work fine if you select the right "prompt style", llama3 in this case. llama-cli if ran with small ctx like 512 once context window is filled stop outputting anything, server after context window is filled just print empty line or slashes or other strange things:

Here is how I run server ./llama-server -m ../../models/meta-llama-3-8b-instruct_q5_k_s.gguf --gpu-layers 35 -c 512 >> then new UI select llama 3.

Screencast.from.2024-06-18.04.17.48.PM.webm

Here is one for cli:

llama.cpp/llama-cli --model ../../models/meta-llama-3-8b-instruct_q5_k_s.gguf --n-gpu-layers 35 -cnv --interactive-first --simple-io --interactive -b 512 --ctx_size 512 --temp 0.3 --top_k 10 --multiline-input --repeat_penalty 1.12 -t 6 --chat-template llama3

@ztrong-forever
Copy link
Author

@ztrong-forever Llama server seems work fine if you select the right "prompt style", llama3 in this case. llama-cli if ran with small ctx like 512 once context window is filled stop outputting anything, server after context window is filled just print empty line or slashes or other strange things:

Here is how I run server ./llama-server -m ../../models/meta-llama-3-8b-instruct_q5_k_s.gguf --gpu-layers 35 -c 512 >> then new UI select llama 3.

Screencast.from.2024-06-18.04.17.48.PM.webm
Here is one for cli:

llama.cpp/llama-cli --model ../../models/meta-llama-3-8b-instruct_q5_k_s.gguf --n-gpu-layers 35 -cnv --interactive-first --simple-io --interactive -b 512 --ctx_size 512 --temp 0.3 --top_k 10 --multiline-input --repeat_penalty 1.12 -t 6 --chat-template llama3

Thanks! It works on my side as well!

@github-actions github-actions bot added the stale label Jul 20, 2024
Copy link
Contributor

github-actions bot commented Aug 3, 2024

This issue was closed because it has been inactive for 14 days since being marked as stale.

@github-actions github-actions bot closed this as completed Aug 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug-unconfirmed low severity Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches) stale
Projects
None yet
Development

No branches or pull requests

2 participants