Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch size auto OOM #1678

Open
sam-paech opened this issue Apr 6, 2024 · 5 comments
Open

Batch size auto OOM #1678

sam-paech opened this issue Apr 6, 2024 · 5 comments
Labels
bug Something isn't working.

Comments

@sam-paech
Copy link

Anybody else get constant OOM with batch size set to auto?

I "fixed" it with a stupid but effective workaround:

models/huggingface.py

    def _detect_batch_size(self, requests=None, pos: int = 0):
        if requests:
            _, context_enc, continuation_enc = requests[pos]
            max_length = len(
                (context_enc + continuation_enc)[-(self.max_length + 1) :][:-1]
            )
            max_context_enc = len(context_enc[-(self.max_length + 1) :])
            max_cont_enc = len(continuation_enc[-(self.max_length + 1) :])
        else:
            max_length = self.max_length

        # if OOM, then halves batch_size and tries again
        @find_executable_batch_size(starting_batch_size=self.max_batch_size)
        def forward_batch(batch_size):
            if self.AUTO_MODEL_CLASS == transformers.AutoModelForSeq2SeqLM:
                length = max(max_context_enc, max_cont_enc)
                batched_conts = torch.ones(
                    (batch_size, length), device=self.device
                ).long()
                test_batch = torch.ones((batch_size, length), device=self.device).long()
                call_kwargs = {
                    "attn_mask": test_batch,
                    "labels": batched_conts,
                }
            else:
                call_kwargs = {}
                test_batch = torch.ones(
                    (batch_size, max_length), device=self.device
                ).long()
            for _ in range(5):
                out = F.log_softmax(self._model_call(test_batch, **call_kwargs), dim=-1)  # noqa: F841

            return batch_size

        try:
            batch_size = forward_batch()
        except RuntimeError as e:
            if "No executable batch size found" in str(e):
                batch_size = 1
            else:
                raise

        if self.world_size > 1:
            # if multi-GPU, always take minimum over all selected batch sizes
            max_rnk_bs = torch.tensor([batch_size], device=self.device)
            gathered = (
                self.accelerator.gather(max_rnk_bs).cpu().detach().numpy().tolist()
            )
            batch_size = min(gathered)
            clear_torch_cache()
            # often get OOM with the auto batch size, so just return half of what it thinks
            if batch_size > 1:
                batch_size = int(round(batch_size/2))
            return batch_size

        clear_torch_cache()
        # often get OOM with the auto batch size, so just return half of what it thinks
        if batch_size > 1:
            batch_size = int(round(batch_size/2))
        return batch_size
@haileyschoelkopf
Copy link
Contributor

haileyschoelkopf commented Apr 7, 2024

Hi!

# often get OOM with the auto batch size, so just return half of what it thinks
        if batch_size > 1:
            batch_size = int(round(batch_size/2))

To clarify: it looks like this is what was added. You're seeing that auto-batch size is found successfully without OOM, but then later in evaluation there is still an OOM?

To help diagnose the issue: what model are you running with and what task type (generative, or loglikelihood-based) is being used? Or does this reliably happen across models?

Thanks!

@haileyschoelkopf haileyschoelkopf added the bug Something isn't working. label Apr 7, 2024
@sam-paech
Copy link
Author

To clarify: it looks like this is what was added. You're seeing that auto-batch size is found successfully without OOM, but then later in evaluation there is still an OOM?

Yep, exactly.

To help diagnose the issue: what model are you running with and what task type (generative, or loglikelihood-based) is being used? Or does this reliably happen across models?

I was getting this issue when running MMLU and AGIEval w/ transformers using logprobs. It happens with some models and not others; I would say maybe 10-20% of the models I was testing (out of ~100 or so) had this issue. If I recall, I was getting it quite a lot with 34b models. I'll see if I can come up with exact settings to repro.

@sam-paech
Copy link
Author

(this is using a fresh install of latest lm-eval on a 3090 runpod)

lm_eval --model hf --model_args pretrained=cognitivecomputations/dolphin-2_2-yi-34b,load_in_4bit=True,bnb_4bit_compute_dtype=float16,max_length=4096,trust_remote_code=True --tasks mmlu,agieval --device cuda:0 --batch_size auto:9 --verbosity DEBUG --log_samples --output_path output/dolphin-2_2-yi-34b --use_cache sqlite_cache_dolphin-2_2-yi-34b

2024-04-08:04:14:20,245 INFO     [model.py:249] Cached requests: 76, Requests remaining: 85955
Running loglikelihood requests:   0%|                                         | 0/85955 [00:00<?, ?it/s]Passed argument batch_size = auto:9.0. Detecting largest batch size
Determined largest batch size: 2
Running loglikelihood requests:   0%|                             | 3/85955 [00:28<183:05:17,  7.67s/it]Traceback (most recent call last):
...
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 776.00 MiB. GPU 0 has a total capacty of 23.68 GiB of which 589.81 MiB is free. Process 833781 has 23.10 GiB memory in use. Of the allocated memory 21.45 GiB is allocated by PyTorch, and 1.35 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

So, it's detecting the batch size at 2 without errors, then throws OOM after a few samples processed.

@haileyschoelkopf
Copy link
Contributor

Hm, thank you for sharing the command!

My best guesses right now are that either 1) this batch size is just really close to the card's max and after a couple batches it gets pushed over the limit due to fragmentation or something or 2) we're not truncating something somewhere accidentally.

Am away this week but will try to investigate ASAP.

@MaciejMarkiewicz
Copy link

Any updates on this issue? I am getting the exact same error with specific models (if it does happen for a model, it happens always) in both single and multi-gpu scenarios. Some models may run without errors in 0-shot scenarios, but get OOM in few-shot (or vice-versa).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working.
Projects
Status: Backlog
Development

No branches or pull requests

3 participants