`openai.BadRequestError` when running `lm_eval` with piqa task using vLLM's OpenAI compatible server #1735

Alnusjaponica · 2024-04-23T05:54:57Z

Description

An error occurs when running lm_eval with the piqa task using vLLM's OpenAI compatible server as follows:

openai.BadRequestError: Error code: 400 - {'object': 'error', 'message': 'Cannot request more than 5 logprobs.', 'type': 'BadRequestError', 'param': None, 'code': 400}

Steps to Reproduce:

Install dependencies.

pip install vllm lm-eval[openai]==0.4.1

Start vLLM's OpenAI compatible server with arbitrary api key.

python -m vllm.entrypoints.openai.api_server --model gpt2 --api-key $OPENAI_API_KEY

Start evaluation

lm_eval --model local-completions \
        --model_args model=gpt2,tokenizer_backend=huggingface,base_url=http:https://localhost:8000/v1 \
        --tasks piqa \
        --batch_size auto:4 \
        --verbosity DEBUG

Additional Information:

According to openai docs,

The maximum value for logprobs is 5.

However, it seems that more than 5 logprobs is specified in lm_eval, e.g. )

lm-evaluation-harness/lm_eval/models/openai_completions.py

Line 218 in 3196e90

logprobs=10,

The text was updated successfully, but these errors were encountered:

haileyschoelkopf · 2024-04-26T15:59:38Z

Hi!

If you change to logprobs=5, does this run correctly? and do the scores appear similar to what are expected / what your model run via HF reports?

Alnusjaponica · 2024-05-02T02:31:29Z

Hi, thanks for the response. In my environment, replacing logprobs=10 with logprobs=5 worked (at least, the inference tasks started), though I got another error from vllm side.

...
INFO 05-02 11:25:26 async_llm_engine.py:120] Finished request cmpl-e2f9dbd8b06549af9a9ff0f5d1c8fc54-87.
ERROR 05-02 11:25:26 async_llm_engine.py:43] Engine background task failed
ERROR 05-02 11:25:26 async_llm_engine.py:43] Traceback (most recent call last):
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 38, in _raise_exception_on_finish
ERROR 05-02 11:25:26 async_llm_engine.py:43]     task.result()
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 480, in run_engine_loop
ERROR 05-02 11:25:26 async_llm_engine.py:43]     has_requests_in_progress = await asyncio.wait_for(
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File "/usr/local/lib/python3.10/asyncio/tasks.py", line 445, in wait_for
ERROR 05-02 11:25:26 async_llm_engine.py:43]     return fut.result()
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 454, in engine_step
ERROR 05-02 11:25:26 async_llm_engine.py:43]     request_outputs = await self.engine.step_async()
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/vllm/engine/async_llm_engine.py", line 213, in step_async
ERROR 05-02 11:25:26 async_llm_engine.py:43]     output = await self.model_executor.execute_model_async(
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 418, in execute_model_async
ERROR 05-02 11:25:26 async_llm_engine.py:43]     all_outputs = await self._run_workers_async(
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/vllm/executor/ray_gpu_executor.py", line 408, in _run_workers_async
ERROR 05-02 11:25:26 async_llm_engine.py:43]     all_outputs = await asyncio.gather(*coros)
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File "/usr/local/lib/python3.10/concurrent/futures/thread.py", line 58, in run
ERROR 05-02 11:25:26 async_llm_engine.py:43]     result = self.fn(*self.args, **self.kwargs)
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 05-02 11:25:26 async_llm_engine.py:43]     return func(*args, **kwargs)
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/vllm/worker/worker.py", line 221, in execute_model
ERROR 05-02 11:25:26 async_llm_engine.py:43]     output = self.model_runner.execute_model(seq_group_metadata_list,
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
ERROR 05-02 11:25:26 async_llm_engine.py:43]     return func(*args, **kwargs)
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/vllm/worker/model_runner.py", line 673, in execute_model
ERROR 05-02 11:25:26 async_llm_engine.py:43]     output = self.model.sample(
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/vllm/model_executor/models/gpt2.py", line 240, in sample
ERROR 05-02 11:25:26 async_llm_engine.py:43]     next_tokens = self.sampler(logits, sampling_metadata)
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
ERROR 05-02 11:25:26 async_llm_engine.py:43]     return self._call_impl(*args, **kwargs)
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
ERROR 05-02 11:25:26 async_llm_engine.py:43]     return forward_call(*args, **kwargs)
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 79, in forward
ERROR 05-02 11:25:26 async_llm_engine.py:43]     prompt_logprobs, sample_logprobs = _get_logprobs(
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 577, in _get_logprobs
ERROR 05-02 11:25:26 async_llm_engine.py:43]     batched_ranks_query_result = _get_ranks(
ERROR 05-02 11:25:26 async_llm_engine.py:43]   File ".../venv/lib/python3.10/site-packages/vllm/model_executor/layers/sampler.py", line 525, in _get_ranks
ERROR 05-02 11:25:26 async_llm_engine.py:43]     return (x > vals[:, None]).long().sum(1).add_(1)
ERROR 05-02 11:25:26 async_llm_engine.py:43] torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 780.00 MiB. GPU 0 has a total capacty of 15.77 GiB of which 601.38 MiB is free. Process 3374765 has 4.23 GiB memory in use. Process 1144678 has 10.01 GiB memory in use. Process 1146417 has 306.00 MiB memory in use. Process 1146227 has 306.00 MiB memory in use. Process 1146057 has 306.00 MiB memory in use. Of the allocated memory 7.10 GiB is allocated by PyTorch, and 1.25 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
INFO 05-02 11:25:26 async_llm_engine.py:154] Aborted request cmpl-e2f9dbd8b06549af9a9ff0f5d1c8fc54-88.
...

Alnusjaponica · 2024-06-12T04:32:43Z

vLLM's Openai compatible server has --max-logprobs option, and setting a large number here suffices for my use case.
https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#command-line-arguments-for-the-server

Alnusjaponica changed the title ~~Error when running lm_eval with piqa task using vLLM's OpenAI compatible server~~ openai.BadRequestError when running lm_eval with piqa task using vLLM's OpenAI compatible server Apr 24, 2024

Alnusjaponica closed this as completed Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`openai.BadRequestError` when running `lm_eval` with piqa task using vLLM's OpenAI compatible server #1735

`openai.BadRequestError` when running `lm_eval` with piqa task using vLLM's OpenAI compatible server #1735

Alnusjaponica commented Apr 23, 2024 •

edited

Loading

haileyschoelkopf commented Apr 26, 2024

Alnusjaponica commented May 2, 2024 •

edited

Loading

Alnusjaponica commented Jun 12, 2024

openai.BadRequestError when running lm_eval with piqa task using vLLM's OpenAI compatible server #1735

openai.BadRequestError when running lm_eval with piqa task using vLLM's OpenAI compatible server #1735

Comments

Alnusjaponica commented Apr 23, 2024 • edited Loading

Description

Steps to Reproduce:

Additional Information:

haileyschoelkopf commented Apr 26, 2024

Alnusjaponica commented May 2, 2024 • edited Loading

Alnusjaponica commented Jun 12, 2024

`openai.BadRequestError` when running `lm_eval` with piqa task using vLLM's OpenAI compatible server #1735

`openai.BadRequestError` when running `lm_eval` with piqa task using vLLM's OpenAI compatible server #1735

Alnusjaponica commented Apr 23, 2024 •

edited

Loading

Alnusjaponica commented May 2, 2024 •

edited

Loading