-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error with VLLM #1136
Comments
Could it make sense to look at the container logs? |
INFO 06-12 12:36:44 api_server.py:177] vLLM API server version 0.5.0 above is the log when I use: you can see that there is no exactly log for this request. |
And I found that I can use it as openai api now, the point is: |
I will try openAI API with VLLM |
the solution worked for hugging face models which could be updated into: |
I started the VLLM via docker, command is:
docker run --runtime nvidia --gpus all -d --restart always -v ~/data/.cache/huggingface:/root/.cache/huggingface -v /data/LLM_models/Qwen/Qwen2-72B-Instruct-GPTQ-Int4:/data/Qwen2-72B-Instruct-GPTQ-Int4 -p 8000:8000 --ipc=host vllm/vllm-openai:latest --served-model-name Qwen2-72B-Instruct-GPTQ-Int4 --model /data/Qwen2-72B-Instruct-GPTQ-Int4 --tensor-parallel-size 4
do the curl test:
curl http:https://localhost:8000/v1/chat/completions -H "Content-Type: application/json" -d '{ "model": "Qwen2-72B-Instruct-GPTQ-Int4", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "give me the answer of 1+1"} ] }'
it works well.
{"id":"cmpl-e44f7f809f5b4ebc82eff1e96c55ad1b","object":"chat.completion","created":1718184574,"model":"Qwen2-72B-Instruct-GPTQ-Int4","choices":[{"index":0,"message":{"role":"assistant","content":"The answer to 1 + 1 is 2.","tool_calls":[]},"logprobs":null,"finish_reason":"stop","stop_reason":null}],"usage":{"prompt_tokens":28,"total_tokens":41,"completion_tokens":13}}
Then I use it in the dspy like:
vllm_qwen = dspy.HFClientVLLM(model="Qwen2-72B-Instruct-GPTQ-Int4", port=8000, url="http:https://localhost")
the error code is:
Failed to parse JSON response: {"object":"error","message":"The model
Qwen2-72B-Instruct-GPTQ-Int4
does not exist.","type":"NotFoundError","param":null,"code":404}then I try to use it like this:
vllm_qwen = dspy.OpenAI(model="Qwen2-72B-Instruct-GPTQ-Int4", api_base="http:https://localhost:8000/v1", api_key='EMPTY')
got another error code:
openai.NotFoundError: Error code: 404 - {'detail': 'Not Found'}
The text was updated successfully, but these errors were encountered: