Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

model qwen2-7b-instruct #2553

Open
cesinsingapore opened this issue Jun 12, 2024 · 8 comments
Open

model qwen2-7b-instruct #2553

cesinsingapore opened this issue Jun 12, 2024 · 8 comments
Labels
bug Something isn't working unconfirmed

Comments

@cesinsingapore
Copy link

image
AI reply not makesense

@cesinsingapore cesinsingapore added bug Something isn't working unconfirmed labels Jun 12, 2024
@cesinsingapore
Copy link
Author

its working fine with another model

@AlexM4H
Copy link

AlexM4H commented Jun 13, 2024

Same behaviour for me.

Temporary Workaroud: GPU_LAYERS: 0

Further infos:

https://huggingface.co/bartowski/Qwen2-7B-Instruct-GGUF/discussions/1

"You can also enable flash attention for llamacpp which should be able to work around the issue"

Is flash attention already set in the actual docker images?

@cesinsingapore
Copy link
Author

I'm using docker-compose direct image from localai latest

Docker-compose.yml

version: "3.9"
services:
api:
image: localai/localai:latest-aio-gpu-nvidia-cuda-12
healthcheck:
test: ["CMD", "curl", "-f", "https://localhost:8080/readyz"]
interval: 1m
timeout: 20m
retries: 5
ports:
- 8080:8080
environment:
- DEBUG=true
# ...
volumes:
- ./models:/build/models:cached
# decomment the following piece if running with Nvidia GPUs
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]`

@AlexM4H
Copy link

AlexM4H commented Jun 14, 2024

Have you entered flash_attention: true in your model yaml file?

@cesinsingapore
Copy link
Author

cesinsingapore commented Jun 14, 2024

do you mean like this ? but it still generate like that after i restarted

models/qwen.yaml

root@a4681b4b3146:/build/models# cat qwen2-7b-instruct.yaml

context_size: 4096
f16: true
mmap: true
name: qwen2-7b-instruct
flash_attention: true
parameters:
model: Qwen2-7B-Instruct-Q4_K_M.gguf
stopwords:

  • <|im_end|>

template:
chat: |
{{.Input -}}
<|im_start|>assistant
chat_message: |
<|im_start|>{{ .RoleName }}
{{ if .FunctionCall -}}
Function call:
{{ else if eq .RoleName "tool" -}}
Function response:
{{ end -}}
{{ if .Content -}}
{{.Content }}
{{ end -}}
{{ if .FunctionCall -}}
{{toJson .FunctionCall}}
{{ end -}}<|im_end|>
completion: |
{{.Input}}
function: |
<|im_start|>system
You are a function calling AI model. You are provided with functions to execute. You may call one or more functions to assist with the user query. Don't make assumptions about what values to plug into functions. Here are the available tools:
{{range .Functions}}
{'type': 'function', 'function': {'name': '{{.Name}}', 'description': '{{.Description}}', 'parameters': {{toJson .Parameters}} }}
{{end}}
For each function call return a json object with function name and arguments
<|im_end|>
{{.Input -}}
<|im_start|>assistant

@AlexM4H
Copy link

AlexM4H commented Jun 14, 2024

Yes, so it works for me.

@AlexM4H
Copy link

AlexM4H commented Jun 17, 2024

@cesinsingapore did you solve your problem?

@cesinsingapore
Copy link
Author

nope its not

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working unconfirmed
Projects
None yet
Development

No branches or pull requests

2 participants