Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] RuntimeError: points_in_polygons_forward_impl: at param 2, inconsistent device: cuda:0 vs cuda:2 #609

Closed
3 tasks done
pangyanhua opened this issue Nov 10, 2022 · 1 comment

Comments

@pangyanhua
Copy link

Prerequisite

Task

I'm using the official example scripts/configs for the officially supported tasks/models/datasets.

Branch

master branch https://github.com/open-mmlab/mmrotate

Environment

/root/miniconda3/envs/py37/lib/python3.7/site-packages/mmcv/init.py:21: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
'On January 1, 2023, MMCV will release v2.0.0, in which it will remove '
fatal: Not a git repository (or any parent up to mount point /home)
Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).
sys.platform: linux
Python: 3.7.13 (default, Mar 29 2022, 02:18:16) [GCC 7.5.0]
CUDA available: True
GPU 0,1,2: NVIDIA GeForce RTX 3090
CUDA_HOME: /usr/local/cuda
NVCC: Cuda compilation tools, release 11.3, V11.3.58
GCC: gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)
PyTorch: 1.12.1+cu113
PyTorch compiling details: PyTorch built with:

  • GCC 9.3
  • C++ Version: 201402
  • Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications
  • Intel(R) MKL-DNN v2.6.0 (Git Hash 52b5f107dd9cf10910aaa19cb47f3abf9b349815)
  • OpenMP 201511 (a.k.a. OpenMP 4.5)
  • LAPACK is enabled (usually provided by MKL)
  • NNPACK is enabled
  • CPU capability usage: AVX2
  • CUDA Runtime 11.3
  • NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86
  • CuDNN 8.3.2 (built against CUDA 11.5)
  • Magma 2.5.2
  • Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.3, CUDNN_VERSION=8.3.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -fopenmp -DNDEBUG -DUSE_KINETO -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -DEDGE_PROFILER_USE_KINETO -O2 -fPIC -Wno-narrowing -Wall -Wextra -Werror=return-type -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-unused-local-typedefs -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=1.12.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=OFF, USE_MPI=OFF, USE_NCCL=ON, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF,

TorchVision: 0.13.1+cu113
OpenCV: 4.6.0
MMCV: 1.7.0
MMCV Compiler: GCC 9.3
MMCV CUDA Compiler: 11.3
MMRotate: 0.3.3+

Reproduces the problem - code sample

When I set GPU_ids=[2], training configs/rotated_atss/rotated_atss_obb_r50_fpn_1x_dota_le135.py,i get this problem

Reproduces the problem - command or script

RuntimeError: points_in_polygons_forward_impl: at param 2, inconsistent device: cuda:0 vs cuda:2

Reproduces the problem - error message

Traceback (most recent call last):
File "/home/psdz/PYH/paper2/mmrotate-0.3.3/tools/train.py", line 194, in
main()
File "/home/psdz/PYH/paper2/mmrotate-0.3.3/tools/train.py", line 190, in main
meta=meta)
File "/home/psdz/PYH/paper2/mmrotate-0.3.3/mmrotate/apis/train.py", line 141, in train_detector
runner.run(data_loaders, cfg.workflow)
File "/root/miniconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 136, in run
epoch_runner(data_loaders[i], **kwargs)
File "/root/miniconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 53, in train
self.run_iter(data_batch, train_mode=True, **kwargs)
File "/root/miniconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/epoch_based_runner.py", line 32, in run_iter
**kwargs)
File "/root/miniconda3/envs/py37/lib/python3.7/site-packages/mmcv/parallel/data_parallel.py", line 77, in train_step
return self.module.train_step(*inputs[0], **kwargs[0])
File "/root/miniconda3/envs/py37/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 248, in train_step
losses = self(**data)
File "/root/miniconda3/envs/py37/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/root/miniconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 119, in new_func
return old_func(*args, **kwargs)
File "/root/miniconda3/envs/py37/lib/python3.7/site-packages/mmdet/models/detectors/base.py", line 172, in forward
return self.forward_train(img, img_metas, **kwargs)
File "/home/psdz/PYH/paper2/mmrotate-0.3.3/mmrotate/models/detectors/single_stage.py", line 82, in forward_train
gt_labels, gt_bboxes_ignore)
File "/root/miniconda3/envs/py37/lib/python3.7/site-packages/mmdet/models/dense_heads/base_dense_head.py", line 335, in forward_train
losses = self.loss(*loss_inputs, gt_bboxes_ignore=gt_bboxes_ignore)
File "/root/miniconda3/envs/py37/lib/python3.7/site-packages/mmcv/runner/fp16_utils.py", line 208, in new_func
return old_func(*args, **kwargs)
File "/home/psdz/PYH/paper2/mmrotate-0.3.3/mmrotate/models/dense_heads/rotated_anchor_head.py", line 485, in loss
label_channels=label_channels)
File "/home/psdz/PYH/paper2/mmrotate-0.3.3/mmrotate/models/dense_heads/rotated_atss_head.py", line 209, in get_targets
unmap_outputs=unmap_outputs)
File "/root/miniconda3/envs/py37/lib/python3.7/site-packages/mmdet/core/utils/misc.py", line 30, in multi_apply
return tuple(map(list, zip(*map_results)))
File "/home/psdz/PYH/paper2/mmrotate-0.3.3/mmrotate/models/dense_heads/rotated_atss_head.py", line 83, in _get_targets_single
None if self.sampling else gt_labels)
File "/home/psdz/PYH/paper2/mmrotate-0.3.3/mmrotate/core/bbox/assigners/atss_obb_assigner.py", line 126, in assign
inside_flag = points_in_polygons(bboxes_points, gt_bboxes)
File "/root/miniconda3/envs/py37/lib/python3.7/site-packages/mmcv/ops/points_in_polygons.py", line 37, in points_in_polygons
polygons.contiguous(), output)
RuntimeError: points_in_polygons_forward_impl: at param 2, inconsistent device: cuda:0 vs cuda:2

Exception raised from Dispatch at /tmp/mmcv/mmcv/ops/csrc/common/pytorch_device_registry.hpp:116 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x3e (0x7f9c2521f20e in /root/miniconda3/envs/py37/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5c (0x7f9c251fa5e8 in /root/miniconda3/envs/py37/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #2: auto Dispatch<DeviceRegistry<void ()(at::Tensor, at::Tensor, at::Tensor, int, int), &(points_in_polygons_forward_impl(at::Tensor, at::Tensor, at::Tensor, int, int))>, at::Tensor const&, at::Tensor const&, at::Tensor&, int const&, int const&>(DeviceRegistry<void ()(at::Tensor, at::Tensor, at::Tensor, int, int), &(points_in_polygons_forward_impl(at::Tensor, at::Tensor, at::Tensor, int, int))> const&, char const*, at::Tensor const&, at::Tensor const&, at::Tensor&, int const&, int const&) + 0x444 (0x7f9b22d2e6e4 in /root/miniconda3/envs/py37/lib/python3.7/site-packages/mmcv/_ext.cpython-37m-x86_64-linux-gnu.so)
frame #3: points_in_polygons_forward_impl(at::Tensor, at::Tensor, at::Tensor, int, int) + 0x5b (0x7f9b22d2dfab in /root/miniconda3/envs/py37/lib/python3.7/site-packages/mmcv/_ext.cpython-37m-x86_64-linux-gnu.so)
frame #4: points_in_polygons_forward(at::Tensor, at::Tensor, at::Tensor) + 0x103 (0x7f9b22d2e0f3 in /root/miniconda3/envs/py37/lib/python3.7/site-packages/mmcv/_ext.cpython-37m-x86_64-linux-gnu.so)
frame #5: + 0x2c1ac1 (0x7f9b22d47ac1 in /root/miniconda3/envs/py37/lib/python3.7/site-packages/mmcv/_ext.cpython-37m-x86_64-linux-gnu.so)
frame #6: + 0x2d64b1 (0x7f9b22d5c4b1 in /root/miniconda3/envs/py37/lib/python3.7/site-packages/mmcv/_ext.cpython-37m-x86_64-linux-gnu.so)
frame #7: _PyMethodDef_RawFastCallKeywords + 0x254 (0x55e4826807a4 in /root/miniconda3/envs/py37/bin/python)
frame #8: + 0x17fb40 (0x55e4826b6b40 in /root/miniconda3/envs/py37/bin/python)
frame #9: _PyEval_EvalFrameDefault + 0x4762 (0x55e4826fe702 in /root/miniconda3/envs/py37/bin/python)
frame #10: _PyFunction_FastCallKeywords + 0x187 (0x55e48266f8d7 in /root/miniconda3/envs/py37/bin/python)
frame #11: _PyEval_EvalFrameDefault + 0x3f5 (0x55e4826fa395 in /root/miniconda3/envs/py37/bin/python)
frame #12: _PyEval_EvalCodeWithName + 0x255 (0x55e48264fe85 in /root/miniconda3/envs/py37/bin/python)
frame #13: _PyFunction_FastCallKeywords + 0x521 (0x55e48266fc71 in /root/miniconda3/envs/py37/bin/python)
frame #14: + 0x17f9c5 (0x55e4826b69c5 in /root/miniconda3/envs/py37/bin/python)
frame #15: _PyEval_EvalFrameDefault + 0x661 (0x55e4826fa601 in /root/miniconda3/envs/py37/bin/python)
frame #16: _PyEval_EvalCodeWithName + 0x255 (0x55e48264fe85 in /root/miniconda3/envs/py37/bin/python)
frame #17: _PyObject_FastCallDict + 0x5be (0x55e48265183e in /root/miniconda3/envs/py37/bin/python)
frame #18: + 0x12f1c3 (0x55e4826661c3 in /root/miniconda3/envs/py37/bin/python)
frame #19: PyObject_Call + 0xb4 (0x55e482651b94 in /root/miniconda3/envs/py37/bin/python)
frame #20: + 0x20f984 (0x55e482746984 in /root/miniconda3/envs/py37/bin/python)
frame #21: _PyObject_FastCallDict + 0x24f (0x55e4826514cf in /root/miniconda3/envs/py37/bin/python)
frame #22: + 0x1225b1 (0x55e4826595b1 in /root/miniconda3/envs/py37/bin/python)
frame #23: PySequence_Tuple + 0x194 (0x55e482688b54 in /root/miniconda3/envs/py37/bin/python)
frame #24: _PyEval_EvalFrameDefault + 0x56f2 (0x55e4826ff692 in /root/miniconda3/envs/py37/bin/python)
frame #25: _PyEval_EvalCodeWithName + 0x255 (0x55e48264fe85 in /root/miniconda3/envs/py37/bin/python)
frame #26: _PyFunction_FastCallKeywords + 0x583 (0x55e48266fcd3 in /root/miniconda3/envs/py37/bin/python)
frame #27: + 0x17f9c5 (0x55e4826b69c5 in /root/miniconda3/envs/py37/bin/python)
frame #28: _PyEval_EvalFrameDefault + 0x1401 (0x55e4826fb3a1 in /root/miniconda3/envs/py37/bin/python)
frame #29: _PyEval_EvalCodeWithName + 0x255 (0x55e48264fe85 in /root/miniconda3/envs/py37/bin/python)
frame #30: _PyFunction_FastCallKeywords + 0x583 (0x55e48266fcd3 in /root/miniconda3/envs/py37/bin/python)
frame #31: + 0x17f9c5 (0x55e4826b69c5 in /root/miniconda3/envs/py37/bin/python)
frame #32: _PyEval_EvalFrameDefault + 0x1401 (0x55e4826fb3a1 in /root/miniconda3/envs/py37/bin/python)
frame #33: _PyEval_EvalCodeWithName + 0x255 (0x55e48264fe85 in /root/miniconda3/envs/py37/bin/python)
frame #34: _PyFunction_FastCallDict + 0x3ec (0x55e48266efcc in /root/miniconda3/envs/py37/bin/python)
frame #35: _PyEval_EvalFrameDefault + 0x1cb8 (0x55e4826fbc58 in /root/miniconda3/envs/py37/bin/python)
frame #36: _PyEval_EvalCodeWithName + 0xdf9 (0x55e482650a29 in /root/miniconda3/envs/py37/bin/python)
frame #37: _PyObject_FastCallDict + 0x5be (0x55e48265183e in /root/miniconda3/envs/py37/bin/python)
frame #38: + 0x12f1c3 (0x55e4826661c3 in /root/miniconda3/envs/py37/bin/python)
frame #39: PyObject_Call + 0xb4 (0x55e482651b94 in /root/miniconda3/envs/py37/bin/python)
frame #40: _PyEval_EvalFrameDefault + 0x1cb8 (0x55e4826fbc58 in /root/miniconda3/envs/py37/bin/python)
frame #41: _PyEval_EvalCodeWithName + 0x255 (0x55e48264fe85 in /root/miniconda3/envs/py37/bin/python)
frame #42: _PyFunction_FastCallKeywords + 0x583 (0x55e48266fcd3 in /root/miniconda3/envs/py37/bin/python)
frame #43: + 0x17f9c5 (0x55e4826b69c5 in /root/miniconda3/envs/py37/bin/python)
frame #44: _PyEval_EvalFrameDefault + 0x4762 (0x55e4826fe702 in /root/miniconda3/envs/py37/bin/python)
frame #45: _PyEval_EvalCodeWithName + 0x7cd (0x55e4826503fd in /root/miniconda3/envs/py37/bin/python)
frame #46: _PyObject_FastCallDict + 0x5be (0x55e48265183e in /root/miniconda3/envs/py37/bin/python)
frame #47: + 0x12f141 (0x55e482666141 in /root/miniconda3/envs/py37/bin/python)
frame #48: PyObject_Call + 0xb4 (0x55e482651b94 in /root/miniconda3/envs/py37/bin/python)
frame #49: _PyEval_EvalFrameDefault + 0x1cb8 (0x55e4826fbc58 in /root/miniconda3/envs/py37/bin/python)
frame #50: _PyEval_EvalCodeWithName + 0x255 (0x55e48264fe85 in /root/miniconda3/envs/py37/bin/python)
frame #51: _PyFunction_FastCallDict + 0x3ec (0x55e48266efcc in /root/miniconda3/envs/py37/bin/python)
frame #52: _PyEval_EvalFrameDefault + 0x1cb8 (0x55e4826fbc58 in /root/miniconda3/envs/py37/bin/python)
frame #53: _PyEval_EvalCodeWithName + 0xdf9 (0x55e482650a29 in /root/miniconda3/envs/py37/bin/python)
frame #54: _PyObject_FastCallDict + 0x5be (0x55e48265183e in /root/miniconda3/envs/py37/bin/python)
frame #55: + 0x12f141 (0x55e482666141 in /root/miniconda3/envs/py37/bin/python)
frame #56: PyObject_Call + 0xb4 (0x55e482651b94 in /root/miniconda3/envs/py37/bin/python)
frame #57: _PyEval_EvalFrameDefault + 0x1cb8 (0x55e4826fbc58 in /root/miniconda3/envs/py37/bin/python)
frame #58: _PyEval_EvalCodeWithName + 0x255 (0x55e48264fe85 in /root/miniconda3/envs/py37/bin/python)
frame #59: _PyObject_FastCallDict + 0x5be (0x55e48265183e in /root/miniconda3/envs/py37/bin/python)
frame #60: + 0x1864bc (0x55e4826bd4bc in /root/miniconda3/envs/py37/bin/python)
frame #61: PyObject_Call + 0xb4 (0x55e482651b94 in /root/miniconda3/envs/py37/bin/python)
frame #62: _PyEval_EvalFrameDefault + 0x1cb8 (0x55e4826fbc58 in /root/miniconda3/envs/py37/bin/python)
frame #63: _PyObject_FastCallDict + 0x1b6 (0x55e482651436 in /root/miniconda3/envs/py37/bin/python)

Process finished with exit code 1

Additional information

No response

@zytx121
Copy link
Collaborator

zytx121 commented Nov 13, 2022

You can try:

CUDA_VISIBLE_DEVICES=2 ./tools/dist_train.sh configs/rotated_atss/rotated_atss_obb_r50_fpn_1x_dota_le135.py 1

@zytx121 zytx121 closed this as completed Dec 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants