-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Random model output using sglang backend server #535
Comments
Did you follow the chat template of Yi-1.5-6B-Chat? I think it uses a different one from the one of Llama.
|
I have an update on this issue. I tested the same code on A100 and the model does not hallucinate. The output is normal. I think it can be related to some features which does not shutdown completely on V100. I am still investigating this. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The description of the bug:
I am using AWS P3 instances with 4 V100 GPUs, and the system configuration is in the section below. I ran the example from the readme. In one tmux window, I execute:
In another tmux window, I execute:
with the change of the correct port. However I got the following output:
I tried the same llama model with vllm and it gave me reasonable answers.
I also try another different model
01-ai/Yi-1.5-6B-Chat
from huggingface but I got random results either:I am uncertain what is going wrong. Currently I am trying to change the tokenizer and also use A100 to see whether the problem persists or not. Any suggestions on what can cause the problem is very welcome. Thanks!
System configuration
I collect this using the
collect_env.py
script from vllm:The text was updated successfully, but these errors were encountered: