-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
generate_completions feedback #11
Comments
Do you use Q4_X (quantized weights)? If so, please test FP16 for quality and maybe check out #12
We have an example of interactive mode in chat_with_bot.py -- probably, it could be extended. For other improvements -- PRs are very welcome :) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
generate_completions seems to be very bad at narration for any meaningful length (past about 200 words), often hallucinating more or repeating passages (for the 8GB fp16quanti4 14B Raven instruct 6 model).
also general usage could be easily improved with a while loop
Would be good to implement a simple while loop so the model does not leave RAM if someone wants to generate more and some of the repetition penalty logic (using GEN_alpha_presence = 0.2 # Presence Penalty and GEN_alpha_frequency = 0.2 # Frequency Penalty) that BlinkDL uses in ChatRWKV.
It's pretty fast on AVX2 ! this is an awesome repo and thank you for your work
The text was updated successfully, but these errors were encountered: