Batch size auto is wrong? #1323

djstrong · 2024-01-19T23:20:09Z

Batch size auto is not working correctly (with generate_until tasks?).

lm_eval --model hf --model_args pretrained=HuggingFaceH4/zephyr-7b-alpha,dtype=bfloat16 --tasks polemo2_in --device cuda:0 --batch_size auto

Traceback (most recent call last):
  File "/net/tscratch/people/plgkwrobel/llm-benchmark/venv/bin/lm_eval", line 8, in <module>
    sys.exit(cli_evaluate())
  File "/net/tscratch/people/plgkwrobel/llm-benchmark/lm-evaluation-harness/lm_eval/__main__.py", line 231, in cli_evaluate
    results = evaluator.simple_evaluate(
  File "/net/tscratch/people/plgkwrobel/llm-benchmark/lm-evaluation-harness/lm_eval/utils.py", line 415, in _wrapper
    return fn(*args, **kwargs)
  File "/net/tscratch/people/plgkwrobel/llm-benchmark/lm-evaluation-harness/lm_eval/evaluator.py", line 150, in simple_evaluate
    results = evaluate(
  File "/net/tscratch/people/plgkwrobel/llm-benchmark/lm-evaluation-harness/lm_eval/utils.py", line 415, in _wrapper
    return fn(*args, **kwargs)
  File "/net/tscratch/people/plgkwrobel/llm-benchmark/lm-evaluation-harness/lm_eval/evaluator.py", line 325, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)
  File "/net/tscratch/people/plgkwrobel/llm-benchmark/lm-evaluation-harness/lm_eval/models/huggingface.py", line 1051, in generate_until
    batch_size = self._detect_batch_size()
  File "/net/tscratch/people/plgkwrobel/llm-benchmark/lm-evaluation-harness/lm_eval/models/huggingface.py", line 610, in _detect_batch_size
    batch_size = forward_batch()
  File "/net/tscratch/people/plgkwrobel/llm-benchmark/venv/lib/python3.10/site-packages/accelerate/utils/memory.py", line 134, in decorator
    raise RuntimeError("No executable batch size found, reached zero.")
RuntimeError: No executable batch size found, reached zero.

The benchmark works with at least batch_size 2 (on 40GB VRAM card).

The text was updated successfully, but these errors were encountered:

haileyschoelkopf · 2024-01-22T14:46:26Z

Thanks for reporting this, will check it out!

I suspect that this is because Mistral has a max length in its config of 32768 , and due to caution we calculate our max batch size based on the model's reported max length to ensure no OOMs will occur at that batch size.

You should be able to get around this with --batch_size auto by also passing --model_args max_length=4096 or some other value that is greater than the documents in this task.

djstrong · 2024-01-22T15:05:46Z

Thank you! I thought calculation of the batch size is based on generated data (especially that it can be recalculated during evaluation).

haileyschoelkopf · 2024-01-22T15:10:34Z

I believe the first instance of batch size calculation is anomalous in this respect (which is perhaps worth changing.)

pminervini · 2024-02-01T21:48:06Z

@haileyschoelkopf could it make sense to just set the auto batch size to one rather than yielding No executable batch size found, reached zero. ?

djstrong · 2024-02-01T21:51:11Z

Other option is to just run with defined batch size and if OOM occurs then decrease and repeat.

pminervini · 2024-02-01T21:58:07Z

A potential hacky workaround:

    batch_size = "auto"
    try:
        results = run_evaluation(eval_request=eval_request, task_names=[task.benchmark], num_fewshot=task.num_fewshot,
                                 batch_size=batch_size, device=DEVICE, use_cache=None, limit=LIMIT)
    except RuntimeError as e:
        if "No executable batch size found" in str(e):
            batch_size = 1
            results = run_evaluation(eval_request=eval_request, task_names=[task.benchmark], num_fewshot=task.num_fewshot,
                                     batch_size=batch_size, device=DEVICE, use_cache=None, limit=LIMIT)
        else:
            raise

StellaAthena · 2024-02-07T04:26:22Z

A potential hacky workaround:

    batch_size = "auto"
    try:
        results = run_evaluation(eval_request=eval_request, task_names=[task.benchmark], num_fewshot=task.num_fewshot,
                                 batch_size=batch_size, device=DEVICE, use_cache=None, limit=LIMIT)
    except RuntimeError as e:
        if "No executable batch size found" in str(e):
            batch_size = 1
            results = run_evaluation(eval_request=eval_request, task_names=[task.benchmark], num_fewshot=task.num_fewshot,
                                     batch_size=batch_size, device=DEVICE, use_cache=None, limit=LIMIT)
        else:
            raise

This seems like an improvement over the current code to me. Whether or not we write a more robust fix later, opening a PR with this would be an improvement.

pminervini · 2024-02-07T12:49:51Z

@StellaAthena I think we can do that here:

lm-evaluation-harness/lm_eval/models/huggingface.py

Line 620 in 756eeb6

batch_size = forward_batch()

Let me make a pull request real quick -- @StellaAthena, done (#1405); feel free to double-check that!

…ound` Fixes EleutherAI#1323

…ound` (#1405) Fixes #1323

…ound` (EleutherAI#1405) Fixes EleutherAI#1323

pminervini added a commit to pminervini/lm-evaluation-harness that referenced this issue Feb 7, 2024

batch_size with auto defaults to 1 if `No executable batch size f…

c255b81

…ound` Fixes EleutherAI#1323

pminervini mentioned this issue Feb 7, 2024

batch_size with auto defaults to 1 if No executable batch size found is raised #1405

Merged

haileyschoelkopf closed this as completed in #1405 Feb 7, 2024

haileyschoelkopf pushed a commit that referenced this issue Feb 7, 2024

batch_size with auto defaults to 1 if `No executable batch size f…

4c17c55

…ound` (#1405) Fixes #1323

wx-zhang pushed a commit to wx-zhang/lm-evaluation-harness that referenced this issue Mar 13, 2024

batch_size with auto defaults to 1 if `No executable batch size f…

14a28ab

…ound` (EleutherAI#1405) Fixes EleutherAI#1323

djstrong pushed a commit to speakleash/lm-evaluation-harness that referenced this issue Aug 2, 2024

batch_size with auto defaults to 1 if `No executable batch size f…

79378a8

…ound` (EleutherAI#1405) Fixes EleutherAI#1323

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch size auto is wrong? #1323

Batch size auto is wrong? #1323

djstrong commented Jan 19, 2024

haileyschoelkopf commented Jan 22, 2024

djstrong commented Jan 22, 2024

haileyschoelkopf commented Jan 22, 2024

pminervini commented Feb 1, 2024

djstrong commented Feb 1, 2024

pminervini commented Feb 1, 2024 •

edited

Loading

StellaAthena commented Feb 7, 2024

pminervini commented Feb 7, 2024 •

edited

Loading

Batch size auto is wrong? #1323

Batch size auto is wrong? #1323

Comments

djstrong commented Jan 19, 2024

haileyschoelkopf commented Jan 22, 2024

djstrong commented Jan 22, 2024

haileyschoelkopf commented Jan 22, 2024

pminervini commented Feb 1, 2024

djstrong commented Feb 1, 2024

pminervini commented Feb 1, 2024 • edited Loading

StellaAthena commented Feb 7, 2024

pminervini commented Feb 7, 2024 • edited Loading

pminervini commented Feb 1, 2024 •

edited

Loading

pminervini commented Feb 7, 2024 •

edited

Loading