HUB/GoogleColab OutOfMemoryError: CUDA out of memory. Requirements to train certain model #262

axife · 2023-05-06T11:14:11Z

axife
May 6, 2023

I try to follow HUB forms to train model. I was able to train smallest COCO128 model in Google Colab.
But when I try to train bigger model like VisDrone I get error out of memory. I use Colab with T4 GPU that has 15GB of RAM.

I tried to play with values like Batch 8 and 4
I used proposed pre-trained model.
But it always tells out of memory.

Is there some table that tells what hardware I need to handle certain training.

Thanks
Artem

Ultralytics HUB: Authenticated ✅
Ultralytics HUB: View model at https://hub.ultralytics.com/models/79M1AWwDkOGUIIVgmpF9 🚀
Ultralytics YOLOv8.0.93 🚀 Python-3.10.11 torch-2.0.0+cu118 CUDA:0 (Tesla T4, 15102MiB)
yolo/engine/trainer: task=detect, mode=train, model=yolov8s.pt, data=VisDrone.yaml, epochs=100, patience=100, batch=4, imgsz=640, save=True, save_period=-1, cache=None, device=, workers=8, project=None, name=None, exist_ok=False, pretrained=False, optimizer=SGD, verbose=True, seed=0, deterministic=True, single_cls=False, rect=False, cos_lr=False, close_mosaic=0, resume=False, amp=True, overlap_mask=True, mask_ratio=4, dropout=0.0, val=True, split=val, save_json=False, save_hybrid=False, conf=None, iou=0.7, max_det=300, half=False, dnn=False, plots=True, source=None, show=False, save_txt=False, save_conf=False, save_crop=False, show_labels=True, show_conf=True, vid_stride=1, line_width=None, visualize=False, augment=False, agnostic_nms=False, classes=None, retina_masks=False, boxes=True, format=torchscript, keras=False, optimize=False, int8=False, dynamic=False, simplify=False, opset=None, workspace=4, nms=False, lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=7.5, cls=0.5, dfl=1.5, pose=12.0, kobj=1.0, label_smoothing=0.0, nbs=64, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0, cfg=None, v5loader=False, tracker=botsort.yaml, save_dir=runs/detect/train6
Overriding model.yaml nc=80 with nc=10

               from  n    params  module                                       arguments

0 -1 1 1 -1 1 2 -1 1 3 -1 1 4 5 6 7 8 9 10 -1 1 11 [-1, 6] 1 12 13 -1 1 14 [-1, 4] 1 15 16 17 [-1, 12] 1 18 19 20 [-1, 9] 1 21 22 [15, 18, 21] 1 Model summary: 225 928 ultralytics.nn.modules.Conv [3, 32, 3, 2]
18560 ultralytics.nn.modules.Conv [32, 64, 3, 2]
29056 ultralytics.nn.modules.C2f [64, 64, 1, True]
73984 ultralytics.nn.modules.Conv [64, 128, 3, 2]
-1 2 197632 ultralytics.nn.modules.C2f [128, 128, 2, True]
-1 1 295424 ultralytics.nn.modules.Conv [128, 256, 3, 2]
-1 2 788480 ultralytics.nn.modules.C2f [256, 256, 2, True]
-1 1 1180672 ultralytics.nn.modules.Conv [256, 512, 3, 2]
-1 1 1838080 ultralytics.nn.modules.C2f [512, 512, 1, True]
-1 1 656896 ultralytics.nn.modules.SPPF [512, 512, 5]
0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
0 ultralytics.nn.modules.Concat [1]
-1 1 591360 ultralytics.nn.modules.C2f [768, 256, 1]
0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
0 ultralytics.nn.modules.Concat [1]
-1 1 148224 ultralytics.nn.modules.C2f [384, 128, 1]
-1 1 147712 ultralytics.nn.modules.Conv [128, 128, 3, 2]
0 ultralytics.nn.modules.Concat [1]
-1 1 493056 ultralytics.nn.modules.C2f [384, 256, 1]
-1 1 590336 ultralytics.nn.modules.Conv [256, 256, 3, 2]
0 ultralytics.nn.modules.Concat [1]
-1 1 1969152 ultralytics.nn.modules.C2f [768, 512, 1]
2119918 ultralytics.nn.modules.Detect [10, [128, 256, 512]]
layers, 11139470 parameters, 11139454 gradients, 28.7 GFLOPs

Transferred 349/355 items from pretrained weights
TensorBoard: Start with 'tensorboard --logdir runs/detect/train6', view at https://localhost:6006/

OutOfMemoryError Traceback (most recent call last)
in <cell line: 4>()
2
3 model = YOLO('https://hub.ultralytics.com/models/79M1AWwDkOGUIIVgmpF9')
----> 4 model.train()

10 frames
/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py in convert(t)
1141 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None,
1142 non_blocking, memory_format=convert_to_format)
-> 1143 return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
1144
1145 return self._apply(convert)

OutOfMemoryError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 14.75 GiB total capacity; 12.84 GiB already allocated; 832.00 KiB free; 13.36 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

glenn-jocher · 2023-05-08T12:29:18Z

glenn-jocher
May 8, 2023
Maintainer

@axife 👋 Hello! Thanks for asking about CUDA memory issues. While we don't have a specific table with requirements, we have general guidelines on reducing CUDA usage when training. I've pasted them here below. Note they were written for YOLOv5 but apply broadly to all PyTorch GPU training.

YOLO 🚀 can be trained on CPU, single-GPU, or multi-GPU. When training on GPU it is important to keep your batch-size small enough that you do not use all of your GPU memory, otherwise you will see a CUDA Out Of Memory (OOM) Error and your training will crash. You can observe your CUDA memory utilization using either the nvidia-smi command or by viewing your console output:

CUDA Out of Memory Solutions

If you encounter a CUDA OOM error, the steps you can take to reduce your memory usage are:

Reduce --batch-size
Reduce --img-size
Reduce model size, i.e. from YOLOv5x -> YOLOv5l -> YOLOv5m -> YOLOv5s > YOLOv5n
Train with multi-GPU at the same --batch-size
Upgrade your hardware to a larger GPU
Train on free GPU backends with up to 16GB of CUDA memory:

AutoBatch

You can use YOLOv5 AutoBatch (NEW) to find the best batch size for your training by passing --batch-size -1. AutoBatch will solve for a 90% CUDA memory-utilization batch-size given your training settings. AutoBatch is experimental, and only works for Single-GPU training. It may not work on all systems, and is not recommended for production use.

Good luck 🍀 and let us know if you have any other questions!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HUB/GoogleColab OutOfMemoryError: CUDA out of memory. Requirements to train certain model #262

{{title}}

Replies: 1 comment

{{title}}

Select a reply

HUB/GoogleColab OutOfMemoryError: CUDA out of memory. Requirements to train certain model #262

axife May 6, 2023

Transferred 349/355 items from pretrained weights TensorBoard: Start with 'tensorboard --logdir runs/detect/train6', view at https://localhost:6006/

Replies: 1 comment

glenn-jocher May 8, 2023 Maintainer

CUDA Out of Memory Solutions

AutoBatch

axife
May 6, 2023

Transferred 349/355 items from pretrained weights
TensorBoard: Start with 'tensorboard --logdir runs/detect/train6', view at https://localhost:6006/

glenn-jocher
May 8, 2023
Maintainer