RuntimeError: CUDA out of memory. Tried to allocate 850.00 MiB (GPU 0; 10.91 GiB total capacity; 8.69 GiB already allocated; 863.44 MiB free; 8.98 GiB reserved in total by PyTorch) #1021

chiba1sonny · 2021-11-08T07:55:21Z

Batch size=1, and used fp16. Still got this error: cuda out of memory.

RuntimeError: CUDA out of memory. Tried to allocate 850.00 MiB (GPU 0; 10.91 GiB total capacity; 8.69 GiB already allocated; 863.44 MiB free; 8.98 GiB reserved in total by PyTorch)
Exception raised from malloc at /opt/conda/conda-bld/pytorch_1595629403081/work/c10/cuda/CUDACachingAllocator.cpp:272 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x4d (0x7f09dcb9c77d in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x20626 (0x7f09dcdf4626 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: + 0x214f4 (0x7f09dcdf54f4 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::cuda::CUDACachingAllocator::raw_alloc(unsigned long) + 0x5e (0x7f09dcdee12e in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #4: + 0xcb2a06 (0x7f09ddcb4a06 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #5: + 0xcb74ec (0x7f09ddcb94ec in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #6: + 0xcafeba (0x7f09ddcb1eba in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0xcb06ce (0x7f09ddcb26ce in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #8: + 0xcb0d90 (0x7f09ddcb2d90 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #9: at::native::cudnn_convolution_backward_weight(c10::ArrayRef, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool) + 0x49 (0x7f09ddcb2fe9 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #10: + 0xd119bb (0x7f09ddd139bb in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #11: + 0xd415f8 (0x7f09ddd435f8 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #12: at::cudnn_convolution_backward_weight(c10::ArrayRef, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool) + 0x1ad (0x7f0a0ff6870d in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #13: at::native::cudnn_convolution_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool, std::array<bool, 2ul>) + 0x18a (0x7f09ddcacc0a in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #14: + 0xd118c5 (0x7f09ddd138c5 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #15: + 0xd41654 (0x7f09ddd43654 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #16: at::cudnn_convolution_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool, std::array<bool, 2ul>) + 0x1e2 (0x7f0a0ff776a2 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #17: + 0x2c250c2 (0x7f0a11c3b0c2 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #18: + 0x2c39684 (0x7f0a11c4f684 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #19: at::cudnn_convolution_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool, std::array<bool, 2ul>) + 0x1e2 (0x7f0a0ff776a2 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #20: torch::autograd::generated::CudnnConvolutionBackward::apply(std::vector<at::Tensor, std::allocatorat::Tensor >&&) + 0x258 (0x7f0a11ac2098 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #21: + 0x30d1017 (0x7f0a120e7017 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #22: torch::autograd::Engine::evaluate_function(std::shared_ptrtorch::autograd::GraphTask&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptrtorch::autograd::ReadyQueue const&) + 0x1400 (0x7f0a120e2860 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #23: torch::autograd::Engine::thread_main(std::shared_ptrtorch::autograd::GraphTask const&) + 0x451 (0x7f0a120e3401 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #24: torch::autograd::Engine::thread_init(int, std::shared_ptrtorch::autograd::ReadyQueue const&, bool) + 0x89 (0x7f0a120db579 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #25: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptrtorch::autograd::ReadyQueue const&, bool) + 0x4a (0x7f0a1640a99a in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #26: + 0xc9039 (0x7f0a18f42039 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/../../../.././libstdc++.so.6)
frame #27: + 0x9609 (0x7f0a3b0ce609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #28: clone + 0x43 (0x7f0a3aff5293 in /lib/x86_64-linux-gnu/libc.so.6)

MengzhangLI · 2021-11-08T08:04:44Z

Hi, 丸山さん:

Which model do you use? Transformer models?

If batch size == 1 and using FP16 already, you could try to make crop size smaller like here.

chiba1sonny · 2021-11-08T08:18:07Z

Thank you for your quick reply. I am using the U-net model. And after setting crop size smaller, it actually worked.
But could you tell me how crop size works?
Thanks, advance.

MengzhangLI · 2021-11-08T08:24:16Z

Seems like your data input is very big?
From our normal (config](https://github.com/open-mmlab/mmsegmentation/tree/master/configs/unet), it is usually < 1G GPU memory in training.
The crop size is the size of actual input of model, which is saved together with model parameters in GPU memory so we could make it smaller to save GPU memory.

MengzhangLI · 2021-11-08T08:29:24Z

(1) Besides make crop size smaller, you could also try to make cudnn_benchmark = False here. Can you avoid CUDA out of memory?

(2) If your customized dataset is not medical image, you could try some other models. Such as BiSeNetV2(FP16), BiSeNetV1 and PSPNet.

chiba1sonny · 2021-11-08T08:31:30Z

Thank you for your guidance.
My image size is 19201080, and I made crop size 800800. It’s working.

Thanks!!!
I will try Bisenet.

MengzhangLI · 2021-11-08T08:35:47Z

Thank you for your guidance. My image size is 1920_1080, and I made crop size 800_800. It’s working.

Thanks!!! I will try Bisenet.

Not too big, if it is not medical image, try to use our model which uses pretrained model in backbone. It would get better results than UNet which is trained from scratch.

MengzhangLI self-assigned this Nov 8, 2021

MengzhangLI closed this as completed Nov 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: CUDA out of memory. Tried to allocate 850.00 MiB (GPU 0; 10.91 GiB total capacity; 8.69 GiB already allocated; 863.44 MiB free; 8.98 GiB reserved in total by PyTorch) #1021

RuntimeError: CUDA out of memory. Tried to allocate 850.00 MiB (GPU 0; 10.91 GiB total capacity; 8.69 GiB already allocated; 863.44 MiB free; 8.98 GiB reserved in total by PyTorch) #1021

chiba1sonny commented Nov 8, 2021

MengzhangLI commented Nov 8, 2021

chiba1sonny commented Nov 8, 2021

MengzhangLI commented Nov 8, 2021

MengzhangLI commented Nov 8, 2021

chiba1sonny commented Nov 8, 2021

MengzhangLI commented Nov 8, 2021

RuntimeError: CUDA out of memory. Tried to allocate 850.00 MiB (GPU 0; 10.91 GiB total capacity; 8.69 GiB already allocated; 863.44 MiB free; 8.98 GiB reserved in total by PyTorch) #1021

RuntimeError: CUDA out of memory. Tried to allocate 850.00 MiB (GPU 0; 10.91 GiB total capacity; 8.69 GiB already allocated; 863.44 MiB free; 8.98 GiB reserved in total by PyTorch) #1021

Comments

chiba1sonny commented Nov 8, 2021

MengzhangLI commented Nov 8, 2021

chiba1sonny commented Nov 8, 2021

MengzhangLI commented Nov 8, 2021

MengzhangLI commented Nov 8, 2021

chiba1sonny commented Nov 8, 2021

MengzhangLI commented Nov 8, 2021