-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
model qwen2-7b-instruct #2553
Comments
its working fine with another model |
Same behaviour for me. Temporary Workaroud: GPU_LAYERS: 0 Further infos: https://huggingface.co/bartowski/Qwen2-7B-Instruct-GGUF/discussions/1 "You can also enable flash attention for llamacpp which should be able to work around the issue" Is flash attention already set in the actual docker images? |
I'm using docker-compose direct image from localai latest Docker-compose.ymlversion: "3.9" |
Have you entered flash_attention: true in your model yaml file? |
do you mean like this ? but it still generate like that after i restarted models/qwen.yamlroot@a4681b4b3146:/build/models# cat qwen2-7b-instruct.yaml
|
Yes, so it works for me. |
@cesinsingapore did you solve your problem? |
nope its not |
AI reply not makesense
The text was updated successfully, but these errors were encountered: