-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Configurable EOS token when using the server #2758
Comments
Sorry started out as a bug (llama 3 not working via server), but chose a more positive approach... That did not change the label though 😞 |
Right, so I found that you can actually specify the stop token in the API call below the messages array. |
Yes, it is quite confusing right now. We will work on the API server to have it directly communicate with model.json, so it can be set by default, and the |
Will be addressed when Jan x Integration is done @Van-QA |
We are working on a new version of Jan x Cortex. So it's recommended to try it via nightly build https://github.com/janhq/jan?tab=readme-ov-file#download. Feel free to get back to us if the issue remains. |
To properly run Llama 3 models, you need to set stop token
<|eot_id|>
.This is currently not configurable when running Jan in API server mode.
The model is automatically loaded by llama.cpp with
This causes the model to not stop generating when it should. It places
<|eot_id|>assistant\n\n
in its output and continues generating several responses/turns.Of course a fix for llama.cpp is already being made, and surely coming to
NitroCortex.Still having this configurable would be nice.
The text was updated successfully, but these errors were encountered: