Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Openai_api_server Error: 400 status code (no body) #3439

Open
GianlucaDeStefano opened this issue Jul 7, 2024 · 0 comments
Open

Openai_api_server Error: 400 status code (no body) #3439

GianlucaDeStefano opened this issue Jul 7, 2024 · 0 comments

Comments

@GianlucaDeStefano
Copy link

GianlucaDeStefano commented Jul 7, 2024

I spawn an openai compatible server using the following docker-compose:

version: "3"
services:
  fastchat-controller:
    build:
      context: .
      dockerfile: Dockerfile
    image: fastchat:latest
    ports:
      - "21001:21001"
    entrypoint: ["python3.9", "-m", "fastchat.serve.controller", "--host", "0.0.0.0", "--port", "21001"]
  fastchat-model-worker:
    build:
      context: .
      dockerfile: Dockerfile
    volumes:
      - /home/gianluca/.cache/huggingface:/root/.cache/huggingface
    image: fastchat:latest
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 2
              capabilities: [gpu]
    ipc: host
    entrypoint: ["python3.9", "-m", "fastchat.serve.vllm_worker", "--model-path", "microsoft/Phi-3-mini-4k-instruct", "--worker-address", "https://fastchat-model-worker:21002", "--controller-address", "https://fastchat-controller:21001", "--host", "0.0.0.0", "--port", "21002", "--num-gpus","2"]
  fastchat-api-server:
    build:
      context: .
      dockerfile: Dockerfile
    image: fastchat:latest
    ports:
      - "8000:8000"
    entrypoint: ["python3.9", "-m", "fastchat.serve.openai_api_server", "--controller-address", "https://fastchat-controller:21001", "--host", "0.0.0.0", "--port", "8000"]

The container instead is:

  FROM nvidia/cuda:12.2.0-runtime-ubuntu20.04

RUN apt-get update -y && apt-get install -y python3.9 python3.9-distutils curl
RUN curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
RUN python3.9 get-pip.py
RUN pip3 install fschat vllm
RUN pip3 install fschat[model_worker,webui]

Everything works, but when the prompt length is close to 4000 tokens (that is size of the the model's context window).
When I approach the limit I keep getting the following error back: Error: 400 status code (no body).
Could someone help me debug the issue? The length of the prompt is still under the length of the context window and the error message is not useful. Is there a debug mode I can use to gather more information on what is happening in the backend?

@GianlucaDeStefano GianlucaDeStefano changed the title Openai_api_server error 400 (no body) Openai_api_server Error: 400 status code (no body) Jul 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant