Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IndexError: list index out of range when running benchmark on gguf model #1768

Open
fherrmannsdoerfer opened this issue Apr 30, 2024 · 2 comments

Comments

@fherrmannsdoerfer
Copy link

I am running a llamacpp server on local host with phi-2_16.gguf model (as described here: #1254).
When running the lm-eval test suite (version 0.4.2) with the following commands:

lm_eval --model gguf --tasks winogrande --model_args base_url=https://localhost:8080 --output_path ./output/phi-2/winogrande --verbosity DEBUG --num_fewshot 5

the execution breaks during the loglikelihood requests (for the 259th request) with the following exception:

File "<path_to_repository>\lm-evaluation-harness\lm_eval\__main__.py", line 341, in cli_evaluate
    results = evaluator.simple_evaluate(
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File <path_to_repository>\lm-evaluation-harness\lm_eval\utils.py", line 288, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "<path_to_repository>\lm-evaluation-harness\lm_eval\evaluator.py", line 251, in simple_evaluate
    results = evaluate(
              ^^^^^^^^^
  File "<path_to_repository>\lm-evaluation-harness\lm_eval\utils.py", line 288, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "<path_to_repository>\lm-evaluation-harness\lm_eval\evaluator.py", line 390, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<path_to_repository>\lm-evaluation-harness\lm_eval\models\gguf.py", line 89, in loglikelihood
    logprob, is_greedy = get_result(logprobs, len(context))
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<path_to_repository>\lm-evaluation-harness\lm_eval\models\gguf.py", line 22, in get_result
    while offsets[idx] < context_length:
          ~~~~~~~^^^^^
IndexError: list index out of range

I can't figure out what the problem is, am I using this incorrectly?

@02Bigboy
Copy link

02Bigboy commented May 9, 2024

i meet the similar issue:
i use lm-eval (0.3.2) run agieval:
lm_eval --model vllm \ --model_args pretrained=${model},trust_remote_code=True,tokenizer_mode="slow",tensor_parallel_size=8,dtype=auto,gpu_memory_utilization=0.8 \ --tasks agieval \ --batch_size auto \ --output_path ${x} \ --num_fewshot 5 \ --device cuda done
getting:
`Traceback (most recent call last):
File "/usr/local/bin/lm_eval", line 8, in
sys.exit(cli_evaluate())

File "<path_to_repository>/lm_eval/main.py", line 342, in cli_evaluate
results = evaluator.simple_evaluate(

File "<path_to_repository>/lm_eval/utils.py", line 288, in _wrapper
return fn(*args, **kwargs)
File "<path_to_repository>/lm_eval/evaluator.py", line 234, in simple_evaluate
results = evaluate(
File "<path_to_repository>lm_eval/utils.py", line 288, in _wrapper
return fn(*args, **kwargs)
File "<path_to_repository>/lm_eval/evaluator.py", line 325, in evaluate
task.build_all_requests(
File "<path_to_repository>/lm_eval/api/task.py", line 418, in build_all_requests
fewshot_ctx = self.fewshot_context(
File "<path_to_repository>/lm_eval/utils.py", line 288, in _wrapper
return fn(*args, **kwargs)
File "<path_to_repository>/lm_eval/api/task.py", line 950, in fewshot_context
labeled_examples = description + self.sampler.get_context(doc, num_fewshot)
File "<path_to_repository>/lm_eval/api/samplers.py", line 37, in get_context
[
File "<path_to_repository>/lm_eval/api/samplers.py", line 49, in
str(self.doc_to_target(doc)[0])
IndexError: list index out of range
`
i don't know the reason and i change the --num_fewshot 0 it works. but i don't know why num_fewshot can't be 5.
and can someone teach me how to set up num_fewshot for different datasets?

@a-ghorbani
Copy link

a-ghorbani commented May 19, 2024

I faced a similar issue.
In my case, the problem was that llama-cpp-python assumes the prompt will always include a BOS token when preparing the text_offset for logprobs.
This resolved my issue: abetlen/llama-cpp-python#1471.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants