Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

openai.BadRequestError when running lm_eval with piqa task using vLLM's OpenAI compatible server #1735

Closed
Alnusjaponica opened this issue Apr 23, 2024 · 3 comments

Comments

@Alnusjaponica
Copy link

Alnusjaponica commented Apr 23, 2024

Description

An error occurs when running lm_eval with the piqa task using vLLM's OpenAI compatible server as follows:

openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'Cannot request more than 5 logprobs.', 'type': 'BadRequestError', 'param': None, 'code': 400}

Steps to Reproduce:

  1. Install dependencies.
    pip install vllm lm-eval[openai]==0.4.1
    
  2. Start vLLM's OpenAI compatible server with arbitrary api key.
    python -m vllm.entrypoints.openai.api_server --model gpt2 --api-key $OPENAI_API_KEY
    
  3. Start evaluation
    lm_eval --model local-completions \
            --model_args model=gpt2,tokenizer_backend=huggingface,base_url=http:https://localhost:8000/v1 \
            --tasks piqa \
            --batch_size auto:4 \
            --verbosity DEBUG
    

Additional Information:

According to openai docs,

The maximum value for logprobs is 5.

However, it seems that more than 5 logprobs is specified in lm_eval, e.g. )

@Alnusjaponica Alnusjaponica changed the title Error when running lm_eval with piqa task using vLLM's OpenAI compatible server openai.BadRequestError when running lm_eval with piqa task using vLLM's OpenAI compatible server Apr 24, 2024
@haileyschoelkopf
Copy link
Contributor

Hi!

If you change to logprobs=5, does this run correctly? and do the scores appear similar to what are expected / what your model run via HF reports?

@Alnusjaponica
Copy link
Author

Alnusjaponica commented May 2, 2024

Hi, thanks for the response. In my environment, replacing logprobs=10 with logprobs=5 worked (at least, the inference tasks started), though I got another error from vllm side.

...
INFO 05-02 11:25:26 async_llm_engine.py:120] Finished request cmpl-e2f9dbd8b06549af9a9ff0f5d1c8fc54-87.
ERROR 05-02 11:25:26 async_llm_engine.py:43] Engine background task failed
ERROR 05-02 11:25:26 async_llm_engine.py:43] Traceback (most recent call last):
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
ERROR 05-02 11:25:26 async_llm_engine.py:43]     task.result()
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 480, in run_engine_loop
ERROR 05-02 11:25:26 async_llm_engine.py:43]     has_requests_in_progress = await asyncio.wait_for(
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File "/usr/local/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
ERROR 05-02 11:25:26 async_llm_engine.py:43]     return fut.result()
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 454, in engine_step
ERROR 05-02 11:25:26 async_llm_engine.py:43]     request_outputs = await self.engine.step_async()
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 213, in step_async
ERROR 05-02 11:25:26 async_llm_engine.py:43]     output = await self.model_executor.execute_model_async(
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 418, in execute_model_async
ERROR 05-02 11:25:26 async_llm_engine.py:43]     all_outputs = await self._run_workers_async(
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 408, in _run_workers_async
ERROR 05-02 11:25:26 async_llm_engine.py:43]     all_outputs = await asyncio.gather(*coros)
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
ERROR 05-02 11:25:26 async_llm_engine.py:43]     result = self.fn(*self.args, **self.kwargs)
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 05-02 11:25:26 async_llm_engine.py:43]     return func(*args, **kwargs)
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/vllm/worker/worker.py", line 221, in execute_model
ERROR 05-02 11:25:26 async_llm_engine.py:43]     output = self.model_runner.execute_model(seq_group_metadata_list,
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 05-02 11:25:26 async_llm_engine.py:43]     return func(*args, **kwargs)
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 673, in execute_model
ERROR 05-02 11:25:26 async_llm_engine.py:43]     output = self.model.sample(
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/vllm/model_executor/models/gpt2.py", line 240, in sample
ERROR 05-02 11:25:26 async_llm_engine.py:43]     next_tokens = self.sampler(logits, sampling_metadata)
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
ERROR 05-02 11:25:26 async_llm_engine.py:43]     return self._call_impl(*args, **kwargs)
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
ERROR 05-02 11:25:26 async_llm_engine.py:43]     return forward_call(*args, **kwargs)
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 79, in forward
ERROR 05-02 11:25:26 async_llm_engine.py:43]     prompt_logprobs, sample_logprobs = _get_logprobs(
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 577, in _get_logprobs
ERROR 05-02 11:25:26 async_llm_engine.py:43]     batched_ranks_query_result = _get_ranks(
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 525, in _get_ranks
ERROR 05-02 11:25:26 async_llm_engine.py:43]     return (x > vals[:, None]).long().sum(1).add_(1)
ERROR 05-02 11:25:26 async_llm_engine.py:43] torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 780.00 MiB. GPU 0 has a total capacty of 15.77 GiB of which 601.38 MiB is free. Process 3374765 has 4.23 GiB memory in use. Process 1144678 has 10.01 GiB memory in use. Process 1146417 has 306.00 MiB memory in use. Process 1146227 has 306.00 MiB memory in use. Process 1146057 has 306.00 MiB memory in use. Of the allocated memory 7.10 GiB is allocated by PyTorch, and 1.25 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
INFO 05-02 11:25:26 async_llm_engine.py:154] Aborted request cmpl-e2f9dbd8b06549af9a9ff0f5d1c8fc54-88.
...

@Alnusjaponica
Copy link
Author

vLLM's Openai compatible server has --max-logprobs option, and setting a large number here suffices for my use case.
https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#command-line-arguments-for-the-server

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants