generate_completions feedback #11

bennmann · 2023-04-03T14:26:39Z

generate_completions seems to be very bad at narration for any meaningful length (past about 200 words), often hallucinating more or repeating passages (for the 8GB fp16quanti4 14B Raven instruct 6 model).

also general usage could be easily improved with a while loop

Would be good to implement a simple while loop so the model does not leave RAM if someone wants to generate more and some of the repetition penalty logic (using GEN_alpha_presence = 0.2 # Presence Penalty and GEN_alpha_frequency = 0.2 # Frequency Penalty) that BlinkDL uses in ChatRWKV.

It's pretty fast on AVX2 ! this is an awesome repo and thank you for your work

saharNooby · 2023-04-03T15:08:04Z

Do you use Q4_X (quantized weights)? If so, please test FP16 for quality and maybe check out #12

the model does not leave RAM if someone wants to generate more

We have an example of interactive mode in chat_with_bot.py -- probably, it could be extended.

For other improvements -- PRs are very welcome :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generate_completions feedback #11

generate_completions feedback #11

bennmann commented Apr 3, 2023 •

edited

Loading

saharNooby commented Apr 3, 2023

generate_completions feedback #11

generate_completions feedback #11

Comments

bennmann commented Apr 3, 2023 • edited Loading

saharNooby commented Apr 3, 2023

bennmann commented Apr 3, 2023 •

edited

Loading