-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RuntimeError: CUDA out of memory. Tried to allocate 850.00 MiB (GPU 0; 10.91 GiB total capacity; 8.69 GiB already allocated; 863.44 MiB free; 8.98 GiB reserved in total by PyTorch) #1021
Comments
Hi, 丸山さん: Which model do you use? Transformer models? If batch size == 1 and using FP16 already, you could try to make crop size smaller like here. |
Thank you for your quick reply. I am using the U-net model. And after setting crop size smaller, it actually worked. |
Seems like your data input is very big? |
(1) Besides make crop size smaller, you could also try to make (2) If your customized dataset is not medical image, you could try some other models. Such as BiSeNetV2(FP16), BiSeNetV1 and PSPNet. |
Thank you for your guidance. Thanks!!! |
Not too big, if it is not medical image, try to use our model which uses pretrained model in backbone. It would get better results than UNet which is trained from scratch. |
Batch size=1, and used fp16. Still got this error: cuda out of memory.
RuntimeError: CUDA out of memory. Tried to allocate 850.00 MiB (GPU 0; 10.91 GiB total capacity; 8.69 GiB already allocated; 863.44 MiB free; 8.98 GiB reserved in total by PyTorch)
Exception raised from malloc at /opt/conda/conda-bld/pytorch_1595629403081/work/c10/cuda/CUDACachingAllocator.cpp:272 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x4d (0x7f09dcb9c77d in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: + 0x20626 (0x7f09dcdf4626 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #2: + 0x214f4 (0x7f09dcdf54f4 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #3: c10::cuda::CUDACachingAllocator::raw_alloc(unsigned long) + 0x5e (0x7f09dcdee12e in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libc10_cuda.so)
frame #4: + 0xcb2a06 (0x7f09ddcb4a06 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #5: + 0xcb74ec (0x7f09ddcb94ec in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #6: + 0xcafeba (0x7f09ddcb1eba in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #7: + 0xcb06ce (0x7f09ddcb26ce in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #8: + 0xcb0d90 (0x7f09ddcb2d90 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #9: at::native::cudnn_convolution_backward_weight(c10::ArrayRef, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool) + 0x49 (0x7f09ddcb2fe9 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #10: + 0xd119bb (0x7f09ddd139bb in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #11: + 0xd415f8 (0x7f09ddd435f8 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #12: at::cudnn_convolution_backward_weight(c10::ArrayRef, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool) + 0x1ad (0x7f0a0ff6870d in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #13: at::native::cudnn_convolution_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool, std::array<bool, 2ul>) + 0x18a (0x7f09ddcacc0a in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #14: + 0xd118c5 (0x7f09ddd138c5 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #15: + 0xd41654 (0x7f09ddd43654 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #16: at::cudnn_convolution_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool, std::array<bool, 2ul>) + 0x1e2 (0x7f0a0ff776a2 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #17: + 0x2c250c2 (0x7f0a11c3b0c2 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #18: + 0x2c39684 (0x7f0a11c4f684 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #19: at::cudnn_convolution_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, c10::ArrayRef, c10::ArrayRef, c10::ArrayRef, long, bool, bool, std::array<bool, 2ul>) + 0x1e2 (0x7f0a0ff776a2 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #20: torch::autograd::generated::CudnnConvolutionBackward::apply(std::vector<at::Tensor, std::allocatorat::Tensor >&&) + 0x258 (0x7f0a11ac2098 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #21: + 0x30d1017 (0x7f0a120e7017 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #22: torch::autograd::Engine::evaluate_function(std::shared_ptrtorch::autograd::GraphTask&, torch::autograd::Node*, torch::autograd::InputBuffer&, std::shared_ptrtorch::autograd::ReadyQueue const&) + 0x1400 (0x7f0a120e2860 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #23: torch::autograd::Engine::thread_main(std::shared_ptrtorch::autograd::GraphTask const&) + 0x451 (0x7f0a120e3401 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #24: torch::autograd::Engine::thread_init(int, std::shared_ptrtorch::autograd::ReadyQueue const&, bool) + 0x89 (0x7f0a120db579 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #25: torch::autograd::python::PythonEngine::thread_init(int, std::shared_ptrtorch::autograd::ReadyQueue const&, bool) + 0x4a (0x7f0a1640a99a in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #26: + 0xc9039 (0x7f0a18f42039 in /home/maruyama/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/lib/../../../.././libstdc++.so.6)
frame #27: + 0x9609 (0x7f0a3b0ce609 in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #28: clone + 0x43 (0x7f0a3aff5293 in /lib/x86_64-linux-gnu/libc.so.6)
The text was updated successfully, but these errors were encountered: