Batch size auto OOM #1678

sam-paech · 2024-04-06T09:47:27Z

Anybody else get constant OOM with batch size set to auto?

I "fixed" it with a stupid but effective workaround:

models/huggingface.py

    def _detect_batch_size(self, requests=None, pos: int = 0):
        if requests:
            _, context_enc, continuation_enc = requests[pos]
            max_length = len(
                (context_enc + continuation_enc)[-(self.max_length + 1) :][:-1]
            )
            max_context_enc = len(context_enc[-(self.max_length + 1) :])
            max_cont_enc = len(continuation_enc[-(self.max_length + 1) :])
        else:
            max_length = self.max_length

        # if OOM, then halves batch_size and tries again
        @find_executable_batch_size(starting_batch_size=self.max_batch_size)
        def forward_batch(batch_size):
            if self.AUTO_MODEL_CLASS == transformers.AutoModelForSeq2SeqLM:
                length = max(max_context_enc, max_cont_enc)
                batched_conts = torch.ones(
                    (batch_size, length), device=self.device
                ).long()
                test_batch = torch.ones((batch_size, length), device=self.device).long()
                call_kwargs = {
                    "attn_mask": test_batch,
                    "labels": batched_conts,
                }
            else:
                call_kwargs = {}
                test_batch = torch.ones(
                    (batch_size, max_length), device=self.device
                ).long()
            for _ in range(5):
                out = F.log_softmax(self._model_call(test_batch, **call_kwargs), dim=-1)  # noqa: F841

            return batch_size

        try:
            batch_size = forward_batch()
        except RuntimeError as e:
            if "No executable batch size found" in str(e):
                batch_size = 1
            else:
                raise

        if self.world_size > 1:
            # if multi-GPU, always take minimum over all selected batch sizes
            max_rnk_bs = torch.tensor([batch_size], device=self.device)
            gathered = (
                self.accelerator.gather(max_rnk_bs).cpu().detach().numpy().tolist()
            )
            batch_size = min(gathered)
            clear_torch_cache()
            # often get OOM with the auto batch size, so just return half of what it thinks
            if batch_size > 1:
                batch_size = int(round(batch_size/2))
            return batch_size

        clear_torch_cache()
        # often get OOM with the auto batch size, so just return half of what it thinks
        if batch_size > 1:
            batch_size = int(round(batch_size/2))
        return batch_size

The text was updated successfully, but these errors were encountered:

haileyschoelkopf · 2024-04-07T13:39:32Z

Hi!

# often get OOM with the auto batch size, so just return half of what it thinks
        if batch_size > 1:
            batch_size = int(round(batch_size/2))

To clarify: it looks like this is what was added. You're seeing that auto-batch size is found successfully without OOM, but then later in evaluation there is still an OOM?

To help diagnose the issue: what model are you running with and what task type (generative, or loglikelihood-based) is being used? Or does this reliably happen across models?

Thanks!

sam-paech · 2024-04-08T03:31:47Z

To clarify: it looks like this is what was added. You're seeing that auto-batch size is found successfully without OOM, but then later in evaluation there is still an OOM?

Yep, exactly.

To help diagnose the issue: what model are you running with and what task type (generative, or loglikelihood-based) is being used? Or does this reliably happen across models?

I was getting this issue when running MMLU and AGIEval w/ transformers using logprobs. It happens with some models and not others; I would say maybe 10-20% of the models I was testing (out of ~100 or so) had this issue. If I recall, I was getting it quite a lot with 34b models. I'll see if I can come up with exact settings to repro.

sam-paech · 2024-04-08T04:24:31Z

(this is using a fresh install of latest lm-eval on a 3090 runpod)

lm_eval --model hf --model_args pretrained=cognitivecomputations/dolphin-2_2-yi-34b,load_in_4bit=True,bnb_4bit_compute_dtype=float16,max_length=4096,trust_remote_code=True --tasks mmlu,agieval --device cuda:0 --batch_size auto:9 --verbosity DEBUG --log_samples --output_path output/dolphin-2_2-yi-34b --use_cache sqlite_cache_dolphin-2_2-yi-34b

2024-04-08:04:14:20,245 INFO     [model.py:249] Cached requests: 76, Requests remaining: 85955
Running loglikelihood requests:   0%|                                         | 0/85955 [00:00<?, ?it/s]Passed argument batch_size = auto:9.0. Detecting largest batch size
Determined largest batch size: 2
Running loglikelihood requests:   0%|                             | 3/85955 [00:28<183:05:17,  7.67s/it]Traceback (most recent call last):
...
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 776.00 MiB. GPU 0 has a total capacty of 23.68 GiB of which 589.81 MiB is free. Process 833781 has 23.10 GiB memory in use. Of the allocated memory 21.45 GiB is allocated by PyTorch, and 1.35 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

So, it's detecting the batch size at 2 without errors, then throws OOM after a few samples processed.

haileyschoelkopf · 2024-04-08T12:20:54Z

Hm, thank you for sharing the command!

My best guesses right now are that either 1) this batch size is just really close to the card's max and after a couple batches it gets pushed over the limit due to fragmentation or something or 2) we're not truncating something somewhere accidentally.

Am away this week but will try to investigate ASAP.

MaciejMarkiewicz · 2024-06-03T14:33:01Z

Any updates on this issue? I am getting the exact same error with specific models (if it does happen for a model, it happens always) in both single and multi-gpu scenarios. Some models may run without errors in 0-shot scenarios, but get OOM in few-shot (or vice-versa).

haileyschoelkopf added the bug Something isn't working. label Apr 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Batch size auto OOM #1678

Batch size auto OOM #1678

sam-paech commented Apr 6, 2024

haileyschoelkopf commented Apr 7, 2024 •

edited

Loading

sam-paech commented Apr 8, 2024

sam-paech commented Apr 8, 2024

haileyschoelkopf commented Apr 8, 2024

MaciejMarkiewicz commented Jun 3, 2024

Batch size auto OOM #1678

Batch size auto OOM #1678

Comments

sam-paech commented Apr 6, 2024

models/huggingface.py

haileyschoelkopf commented Apr 7, 2024 • edited Loading

sam-paech commented Apr 8, 2024

sam-paech commented Apr 8, 2024

haileyschoelkopf commented Apr 8, 2024

MaciejMarkiewicz commented Jun 3, 2024

haileyschoelkopf commented Apr 7, 2024 •

edited

Loading