Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: Trying to create tensor with negative dimension #260

Open
Rusteam opened this issue Jul 1, 2022 · 5 comments
Open

RuntimeError: Trying to create tensor with negative dimension #260

Rusteam opened this issue Jul 1, 2022 · 5 comments

Comments

@Rusteam
Copy link

Rusteam commented Jul 1, 2022

Hi there,

while training I get the following error at test stage after some number of epochs:

Traceback (most recent call last):
  File "/usr/src/app/pipelines/yolor/../../src/models/yolor/train.py", line 537, in <module>
    train(hyp, opt, device, tb_writer, wandb)
  File "/usr/src/app/pipelines/yolor/../../src/models/yolor/train.py", line 336, in train
    results, maps, times = test.test(opt.data,
  File "/usr/src/app/src/models/yolor/test.py", line 134, in test
    output = non_max_suppression(inf_out, conf_thres=conf_thres, iou_thres=iou_thres)
  File "/usr/src/app/src/models/yolor/utils/general.py", line 341, in non_max_suppression
    i = torch.ops.torchvision.nms(boxes, scores, iou_thres)
  File "/usr/local/lib/python3.9/dist-packages/torch/_ops.py", line 142, in __call__
    return self._op(*args, **kwargs or {})
RuntimeError: Trying to create tensor with negative dimension -726820594: [-726820594]

My env:

torch=='1.12.0.dev20220314+cu102'
torchvision=='0.13.0.dev20220314+cu102'
python 3.9.10
@YiCheno
Copy link

YiCheno commented Jul 6, 2022

Hi there,

I got an error message same as you on batch_size=2.
I think the error was about batch_size, because i trier to change batch_size=3, the error disappeared.

I don't know the total reason for error, but I can train my dataset on this method.
If I find the reason, I will tell you in here.

Hope can halp you.

My env:

python 3.7
torch==1.7.0+cu101 
torchvision==0.8.1+cu101 
torchaudio==0.7.0

GPU: RTX2080Ti 11G

@Rusteam
Copy link
Author

Rusteam commented Jul 6, 2022

I'm not sure about batch size, because it happens after a some number of epochs. Let's say it has been training fine and testing fine for 15 epochs and then suddenly it throws this error.

Also it feels that the value is a box coordinate and it should not be that high.

@YiCheno
Copy link

YiCheno commented Jul 15, 2022

I'm not sure about batch size, because it happens after a some number of epochs. Let's say it has been training fine and testing fine for 15 epochs and then suddenly it throws this error.

Also it feels that the value is a box coordinate and it should not be that high.

Update: I debug the code.
In ./utils/general.py here, I finded the reason of why happened this error.
In this file's 320 ~ 350 line, you can see the follow code:

320     # Box (center x, center y, width, height) to (x1, y1, x2, y2)
321     box = xywh2xyxy(x[:, :4])
... ...

347     # Batched NMS
348     c = x[:, 5:6] * (0 if agnostic else max_wh)  # classes
349     boxes, scores = x[:, :4] + c, x[:, 4]  # boxes (offset by class), scores
350     i = torch.ops.torchvision.nms(boxes, scores, iou_thres)

You can try to debug the code when you train your models, In the 350 line, you can see the boxes's size variable is a large, but boxes(350 line) and box(321 line) is float32 and float16 type on your GPU, so I think the error is happended in here.

My solution:
I tried to change of ./test.py's conf_thres in 35 line, like following:

31    def test(data,
32             weights=None,
33             batch_size=16,
34             imgsz=640,
35             conf_thres=0.001,
36             iou_thres=0.6,  # for NMS
37             save_json=False,
38             single_cls=False,
39             augment=False,
40             verbose=False,
41             model=None,
42             dataloader=None,
43             save_dir=Path(''),  # for saving images
44             save_txt=False,  # for auto-labelling
45             save_conf=False,
46             plots=True,
47             log_imgs=0):  # number of logged images

# After modification.

31    def test(data,
32             weights=None,
33             batch_size=16,
34             imgsz=640,
35             conf_thres=0.01,
36             iou_thres=0.6,  # for NMS
37             save_json=False,
38             single_cls=False,
39             augment=False,
40             verbose=False,
41             model=None,
42             dataloader=None,
43             save_dir=Path(''),  # for saving images
44             save_txt=False,  # for auto-labelling
45             save_conf=False,
46             plots=True,
47             log_imgs=0):  # number of logged images

This method can eliminate this error.
Hope can be help you. @Rusteam

@Rusteam
Copy link
Author

Rusteam commented Jul 15, 2022

Did it help?

@YiCheno
Copy link

YiCheno commented Jul 16, 2022

Yes, The method can be help me.

I used my dataset on YOLOR.
Because my dataset is mini object detection, and I changed YOLOR's architecture, this is the reason for producing a lot of boxes.
The method would reduce a lot of boxes, You should adjust your conf_thres, according to your dataset and model architecture.
It is worth not that, You couldn't be boxes become to few, since the model would use them. If you want to a definite boxes's parameter, You can refer to the official example.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants