Bug: The output of the lama-clI is not the same as the output of the lama-server #7973

ztrong-forever · 2024-06-17T09:59:30Z

What happened?

run llama-cli:

./bin/llama-cli -m ./models/Meta-Llama-3-8B-Instruct.Q2_K.gguf -n 512 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt

output: Here I get the desired result

run llama-server:

./bin/llama-server -m ./models/Meta-Llama-3-8B-Instruct.Q2_K.gguf -c 2048

nodejs code
output: The results here are confusing. How do I make them consistent

Name and Version

llama-cli:
version: 3164 (df68d4f)
built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu

llama-server:
version: 3164 (df68d4f)
built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu

What operating system are you seeing the problem on?

Linux

Relevant log output

No response

The text was updated successfully, but these errors were encountered:

dspasyuk · 2024-06-17T17:32:18Z

I have been told that I need to use the specific prompt for instruct models, which I use in my config but it's still does not work with Llama 3 instruct, I am still waiting for reply, see here: #7929 (comment)

ztrong-forever · 2024-06-18T11:19:21Z

I have been told that I need to use the specific prompt for instruct models, which I use in my config but it's still does not work with Llama 3 instruct, I am still waiting for reply, see here: #7929 (comment)

I would like to know if you have tried to compare the results of llama-cli and llama-server?

dspasyuk · 2024-06-18T16:27:36Z

@ztrong-forever Llama server seems work fine if you select the right "prompt style", llama3 in this case. llama-cli if ran with small ctx like 512 once context window is filled stop outputting anything, server after context window is filled just print empty line or slashes or other strange things:

Here is how I run server ./llama-server -m ../../models/meta-llama-3-8b-instruct_q5_k_s.gguf --gpu-layers 35 -c 512 >> then new UI select llama 3.

Screencast.from.2024-06-18.04.17.48.PM.webm

Here is one for cli:

llama.cpp/llama-cli --model ../../models/meta-llama-3-8b-instruct_q5_k_s.gguf --n-gpu-layers 35 -cnv --interactive-first --simple-io --interactive -b 512 --ctx_size 512 --temp 0.3 --top_k 10 --multiline-input --repeat_penalty 1.12 -t 6 --chat-template llama3

ztrong-forever · 2024-06-19T14:31:23Z

@ztrong-forever Llama server seems work fine if you select the right "prompt style", llama3 in this case. llama-cli if ran with small ctx like 512 once context window is filled stop outputting anything, server after context window is filled just print empty line or slashes or other strange things:

Here is how I run server ./llama-server -m ../../models/meta-llama-3-8b-instruct_q5_k_s.gguf --gpu-layers 35 -c 512 >> then new UI select llama 3.

Screencast.from.2024-06-18.04.17.48.PM.webm
Here is one for cli:

llama.cpp/llama-cli --model ../../models/meta-llama-3-8b-instruct_q5_k_s.gguf --n-gpu-layers 35 -cnv --interactive-first --simple-io --interactive -b 512 --ctx_size 512 --temp 0.3 --top_k 10 --multiline-input --repeat_penalty 1.12 -t 6 --chat-template llama3

Thanks! It works on my side as well!

github-actions · 2024-08-03T01:18:12Z

This issue was closed because it has been inactive for 14 days since being marked as stale.

ztrong-forever added bug-unconfirmed low severity Used to report low severity bugs in llama.cpp (e.g. cosmetic issues, non critical UI glitches) labels Jun 17, 2024

github-actions bot added the stale label Jul 20, 2024

github-actions bot closed this as completed Aug 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: The output of the lama-clI is not the same as the output of the lama-server #7973

Bug: The output of the lama-clI is not the same as the output of the lama-server #7973

ztrong-forever commented Jun 17, 2024

dspasyuk commented Jun 17, 2024 •

edited

Loading

ztrong-forever commented Jun 18, 2024

dspasyuk commented Jun 18, 2024 •

edited

Loading

ztrong-forever commented Jun 19, 2024

github-actions bot commented Aug 3, 2024

Bug: The output of the lama-clI is not the same as the output of the lama-server #7973

Bug: The output of the lama-clI is not the same as the output of the lama-server #7973

Comments

ztrong-forever commented Jun 17, 2024

What happened?

run llama-cli:

run llama-server:

Name and Version

What operating system are you seeing the problem on?

Relevant log output

dspasyuk commented Jun 17, 2024 • edited Loading

ztrong-forever commented Jun 18, 2024

dspasyuk commented Jun 18, 2024 • edited Loading

ztrong-forever commented Jun 19, 2024

github-actions bot commented Aug 3, 2024

dspasyuk commented Jun 17, 2024 •

edited

Loading

dspasyuk commented Jun 18, 2024 •

edited

Loading