Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch size auto is wrong? #1323

Closed
djstrong opened this issue Jan 19, 2024 · 8 comments · Fixed by #1405
Closed

Batch size auto is wrong? #1323

djstrong opened this issue Jan 19, 2024 · 8 comments · Fixed by #1405

Comments

@djstrong
Copy link
Contributor

Batch size auto is not working correctly (with generate_until tasks?).

lm_eval --model hf --model_args pretrained=HuggingFaceH4/zephyr-7b-alpha,dtype=bfloat16 --tasks polemo2_in --device cuda:0 --batch_size auto

Traceback (most recent call last):
  File "/net/tscratch/people/plgkwrobel/llm-benchmark/venv/bin/lm_eval", line 8, in <module>
    sys.exit(cli_evaluate())
  File "/net/tscratch/people/plgkwrobel/llm-benchmark/lm-evaluation-harness/lm_eval/__main__.py", line 231, in cli_evaluate
    results = evaluator.simple_evaluate(
  File "/net/tscratch/people/plgkwrobel/llm-benchmark/lm-evaluation-harness/lm_eval/utils.py", line 415, in _wrapper
    return fn(*args, **kwargs)
  File "/net/tscratch/people/plgkwrobel/llm-benchmark/lm-evaluation-harness/lm_eval/evaluator.py", line 150, in simple_evaluate
    results = evaluate(
  File "/net/tscratch/people/plgkwrobel/llm-benchmark/lm-evaluation-harness/lm_eval/utils.py", line 415, in _wrapper
    return fn(*args, **kwargs)
  File "/net/tscratch/people/plgkwrobel/llm-benchmark/lm-evaluation-harness/lm_eval/evaluator.py", line 325, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)
  File "/net/tscratch/people/plgkwrobel/llm-benchmark/lm-evaluation-harness/lm_eval/models/huggingface.py", line 1051, in generate_until
    batch_size = self._detect_batch_size()
  File "/net/tscratch/people/plgkwrobel/llm-benchmark/lm-evaluation-harness/lm_eval/models/huggingface.py", line 610, in _detect_batch_size
    batch_size = forward_batch()
  File "/net/tscratch/people/plgkwrobel/llm-benchmark/venv/lib/python3.10/site-packages/accelerate/utils/memory.py", line 134, in decorator
    raise RuntimeError("No executable batch size found, reached zero.")
RuntimeError: No executable batch size found, reached zero.

The benchmark works with at least batch_size 2 (on 40GB VRAM card).

@haileyschoelkopf
Copy link
Collaborator

Thanks for reporting this, will check it out!

I suspect that this is because Mistral has a max length in its config of 32768 , and due to caution we calculate our max batch size based on the model's reported max length to ensure no OOMs will occur at that batch size.

You should be able to get around this with --batch_size auto by also passing --model_args max_length=4096 or some other value that is greater than the documents in this task.

@djstrong
Copy link
Contributor Author

Thank you! I thought calculation of the batch size is based on generated data (especially that it can be recalculated during evaluation).

@haileyschoelkopf
Copy link
Collaborator

I believe the first instance of batch size calculation is anomalous in this respect (which is perhaps worth changing.)

@pminervini
Copy link
Contributor

@haileyschoelkopf could it make sense to just set the auto batch size to one rather than yielding No executable batch size found, reached zero. ?

@djstrong
Copy link
Contributor Author

djstrong commented Feb 1, 2024

Other option is to just run with defined batch size and if OOM occurs then decrease and repeat.

@pminervini
Copy link
Contributor

pminervini commented Feb 1, 2024

A potential hacky workaround:

    batch_size = "auto"
    try:
        results = run_evaluation(eval_request=eval_request, task_names=[task.benchmark], num_fewshot=task.num_fewshot,
                                 batch_size=batch_size, device=DEVICE, use_cache=None, limit=LIMIT)
    except RuntimeError as e:
        if "No executable batch size found" in str(e):
            batch_size = 1
            results = run_evaluation(eval_request=eval_request, task_names=[task.benchmark], num_fewshot=task.num_fewshot,
                                     batch_size=batch_size, device=DEVICE, use_cache=None, limit=LIMIT)
        else:
            raise

@StellaAthena
Copy link
Member

A potential hacky workaround:

    batch_size = "auto"
    try:
        results = run_evaluation(eval_request=eval_request, task_names=[task.benchmark], num_fewshot=task.num_fewshot,
                                 batch_size=batch_size, device=DEVICE, use_cache=None, limit=LIMIT)
    except RuntimeError as e:
        if "No executable batch size found" in str(e):
            batch_size = 1
            results = run_evaluation(eval_request=eval_request, task_names=[task.benchmark], num_fewshot=task.num_fewshot,
                                     batch_size=batch_size, device=DEVICE, use_cache=None, limit=LIMIT)
        else:
            raise

This seems like an improvement over the current code to me. Whether or not we write a more robust fix later, opening a PR with this would be an improvement.

@pminervini
Copy link
Contributor

pminervini commented Feb 7, 2024

@StellaAthena I think we can do that here:

batch_size = forward_batch()

Let me make a pull request real quick -- @StellaAthena, done (#1405); feel free to double-check that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants