OutOfMemoryError while training the cross-encoder #22

kunalr97 · 2023-11-27T10:06:27Z

train_args = CrossEncoderTrainingArgs(num_train_epochs = 5)

rr = CrossEncoderReranker()
output_dir = f'../outputs/{label2dict[label]}_index/cross_encoder_training/'

rr.fit(
    train_dataset = train,
    val_dataset = val,
    output_dir= output_dir,
    training_args = train_args,
    show_progress_bar = False
)

When i try to train the cross encoder on the BRONCO dataset for prediciting the ICD code for the diagnoses entities. I get this error:

OutOfMemoryError: CUDA out of memory. Tried to allocate 768.00 MiB (GPU 0; 15.77 GiB total capacity; 14.34 GiB already allocated; 379.12 MiB free; 15.03 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I tried running this line and it does not seem to work. Also there are not any other processes running on the GPU.

import torch
torch.cuda.empty_cache()

Thanks in advance for your help.

phlobo · 2023-11-27T11:27:10Z

Hello!

The cross-encoder is indeed quite memory intensive (I tested everything with 48GB GPU memory). Two things that might work:

I'm not sure if all memory allocated by SapBERT would be cleared by empty_cache(), you might instead want to save the candidate dataset to disk and restart the process / notebook to make sure CUDA memory is entirely freed up.
You can reduce the memory footprint of the cross-encoder by reducing the number of candidates subject to re-ranking (which equals the batch size) to something like 16 instead of 64.

phlobo · 2023-11-27T11:35:36Z

Another thing that might work (though I have not tested the performance), would be to use a smaller BERT model, i.e.,

train_args = CrossEncoderTrainingArgs(model_name="distilbert-base-multilingual-cased")

kunalr97 · 2023-11-27T12:44:11Z

Hi,
Thanks for your quick response. I will try this and hope that it works. Where exactly do i need to do this ?

2. You can reduce the memory footprint of the cross-encoder by reducing the number of candidates subject to re-ranking (which equals the batch size) to something like 16 instead of 64.

Thanks in advance

phlobo · 2023-11-27T12:48:24Z

There are multiple steps at which you can reduce the number of candidates. However, if you follow this notebook (https://github.com/hpi-dhc/xmen/blob/main/examples/02_BRONCO.ipynb), then setting K_RERANKING = 16 just before calling CrossEncoderReranker.prepare_data should do the trick.

Note: I assume that this will cost you a bit of recall@1, but it might actually increase precision. To get precision, recall and F1 scores at the end, use evaluate instead of evaluate_at_k

kunalr97 · 2023-11-27T13:03:43Z

Thanks a lot! I don't get that error now.

phlobo · 2023-11-27T13:14:42Z

Thank you for pointing this issue out, I have linked this thread in the README

kunalr97 closed this as completed Nov 27, 2023

phlobo added documentation Improvements or additions to documentation question Further information is requested labels Nov 28, 2023

phlobo self-assigned this Dec 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OutOfMemoryError while training the cross-encoder #22

OutOfMemoryError while training the cross-encoder #22

kunalr97 commented Nov 27, 2023 •

edited

Loading

phlobo commented Nov 27, 2023 •

edited

Loading

phlobo commented Nov 27, 2023 •

edited

Loading

kunalr97 commented Nov 27, 2023

phlobo commented Nov 27, 2023

kunalr97 commented Nov 27, 2023

phlobo commented Nov 27, 2023

OutOfMemoryError while training the cross-encoder #22

OutOfMemoryError while training the cross-encoder #22

Comments

kunalr97 commented Nov 27, 2023 • edited Loading

phlobo commented Nov 27, 2023 • edited Loading

phlobo commented Nov 27, 2023 • edited Loading

kunalr97 commented Nov 27, 2023

phlobo commented Nov 27, 2023

kunalr97 commented Nov 27, 2023

phlobo commented Nov 27, 2023

kunalr97 commented Nov 27, 2023 •

edited

Loading

phlobo commented Nov 27, 2023 •

edited

Loading

phlobo commented Nov 27, 2023 •

edited

Loading