用官方的BiseNetV1配置文件运行自己的数据集报错 #3401

loxoo6 · 2023-07-24T01:45:43Z

问题确认 Search before asking

我已经查询历史issue(包括open与closed)，没有发现相似的bug。I have searched the open and closed issues and found no similar bug report.

Bug描述 Describe the Bug

2023-07-24 09:40:35 [INFO]
------------Environment Information-------------
platform: Linux-4.15.0-140-generic-x86_64-with-debian-stretch-sid
Python: 3.7.4 (default, Aug 13 2019, 20:35:49) [GCC 7.3.0]
Paddle compiled with cuda: True
NVCC: Build cuda_11.2.r11.2/compiler.29618528_0
cudnn: 8.2
GPUs used: 1
CUDA_VISIBLE_DEVICES: None
GPU: ['GPU 0: Tesla V100-SXM2-32GB']
GCC: gcc (Ubuntu 7.5.0-3ubuntu1~16.04) 7.5.0
PaddleSeg: 2.7.0
PaddlePaddle: 2.3.2
OpenCV: 4.1.1

2023-07-24 09:40:35 [INFO]
---------------Config Information---------------
batch_size: 4
iters: 160000
loss:
coef:

1
1
1
types:
ignore_index: 255
type: OhemCrossEntropyLoss
ignore_index: 255
type: OhemCrossEntropyLoss
ignore_index: 255
type: OhemCrossEntropyLoss
lr_scheduler:
end_lr: 0.0
learning_rate: 0.01
power: 0.9
type: PolynomialDecay
model:
backbone:
in_channels: 3
output_stride: 8
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet18_vd_ssld_v2.tar.gz
type: ResNet18_vd
num_classes: 2
type: BiseNetV1
optimizer:
type: sgd
weight_decay: 0.0005
train_dataset:
dataset_root: /home/aistudio/PaddleSeg/data
img_channels: 3
mode: train
num_classes: 2
train_path: /home/aistudio/PaddleSeg/data/train_list.txt
transforms:
max_scale_factor: 2.0
min_scale_factor: 0.5
scale_step_size: 0.25
type: ResizeStepScaling
crop_size:
- 512
- 512
  type: RandomPaddingCrop
type: RandomHorizontalFlip
type: RandomDistort
type: Normalize
type: Dataset
val_dataset:
dataset_root: /home/aistudio/PaddleSeg/data
img_channels: 3
mode: val
num_classes: 2
transforms:
type: Normalize
type: Dataset
val_path: /home/aistudio/PaddleSeg/data/val_list.txt

W0724 09:40:35.476796 7755 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 11.2, Runtime API Version: 11.2
W0724 09:40:35.476837 7755 gpu_resources.cc:91] device: 0, cuDNN Version: 8.2.
2023-07-24 09:40:36 [INFO] Loading pretrained model from https://bj.bcebos.com/paddleseg/dygraph/resnet18_vd_ssld_v2.tar.gz
Connecting to https://bj.bcebos.com/paddleseg/dygraph/resnet18_vd_ssld_v2.tar.gz
Downloading resnet18_vd_ssld_v2.tar.gz
[==================================================] 100.00%
Uncompress resnet18_vd_ssld_v2.tar.gz
[==================================================] 100.00%
2023-07-24 09:40:38 [INFO] There are 115/115 variables loaded into ResNet_vd.
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/nn/layer/norm.py:654: UserWarning: When training, we now always track global mean and variance.
"When training, we now always track global mean and variance.")
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/math_op_patch.py:278: UserWarning: The dtype of left and right variables are not the same, left dtype is paddle.float32, but right dtype is paddle.int64, the right dtype will convert to paddle.float32
format(lhs_dtype, rhs_dtype, lhs_dtype))
2023-07-24 09:40:55 [INFO] [TRAIN] epoch: 1, iter: 10/160000, loss: 3.4389, lr: 0.009999, batch_cost: 1.6020, reader_cost: 1.16270, ips: 2.4968 samples/sec | ETA 71:11:48
2023-07-24 09:41:10 [INFO] [TRAIN] epoch: 1, iter: 20/160000, loss: 5.1390, lr: 0.009999, batch_cost: 1.4837, reader_cost: 1.32117, ips: 2.6959 samples/sec | ETA 65:56:05
2023-07-24 09:41:25 [INFO] [TRAIN] epoch: 1, iter: 30/160000, loss: 2.9624, lr: 0.009998, batch_cost: 1.5491, reader_cost: 1.39400, ips: 2.5821 samples/sec | ETA 68:50:10
2023-07-24 09:41:40 [INFO] [TRAIN] epoch: 1, iter: 40/160000, loss: 2.4292, lr: 0.009998, batch_cost: 1.4487, reader_cost: 1.30202, ips: 2.7611 samples/sec | ETA 64:22:17
2023-07-24 09:41:55 [INFO] [TRAIN] epoch: 1, iter: 50/160000, loss: 2.5561, lr: 0.009997, batch_cost: 1.5167, reader_cost: 1.35702, ips: 2.6373 samples/sec | ETA 67:23:18
2023-07-24 09:41:55 [INFO] Start evaluating (total_samples: 30, total_iters: 30)...
Traceback (most recent call last):
File "tools/train.py", line 262, in
main(args)
File "tools/train.py", line 257, in main
to_static_training=cfg.to_static_training)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleseg/core/train.py", line 289, in train
**test_config)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleseg/core/val.py", line 165, in evaluate
ignore_index=eval_dataset.ignore_index)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddleseg/utils/metrics.py", line 43, in calculate_area
label.shape))
ValueError: Shape of pred and `label should be equal, but there are [1, 4032, 2272] and [1, 4032, 2268].
terminate called without an active exception

C++ Traceback (most recent call last):

No stack trace in paddle, may be caused by external reasons.

Error Message Summary:

FatalError: Process abort signal is detected by the operating system.
[TimeInfo: *** Aborted at 1690162917 (unix time) try "date -d @1690162917" if you are using GNU date ***]
[SignalInfo: *** SIGABRT (@0x3e800001e4b) received by PID 7755 (TID 0x7f04ccaae700) from PID 7755 ***]

复现环境 Environment

配置文件是：
base: '../base/cityscapes.yml'

batch_size: 4
iters: 160000

model:
type: BiseNetV1
backbone:
type: ResNet18_vd
output_stride: 8
pretrained: https://bj.bcebos.com/paddleseg/dygraph/resnet18_vd_ssld_v2.tar.gz

train_dataset:
type: Dataset
dataset_root: /home/aistudio/PaddleSeg/data
train_path: /home/aistudio/PaddleSeg/data/train_list.txt
num_classes: 2
mode: train
transforms:
- type: ResizeStepScaling
min_scale_factor: 0.5
max_scale_factor: 2.0
scale_step_size: 0.25
- type: RandomPaddingCrop
crop_size: [512, 512]
- type: RandomHorizontalFlip
- type: RandomDistort
- type: Normalize

val_dataset:
type: Dataset
dataset_root: /home/aistudio/PaddleSeg/data
val_path: /home/aistudio/PaddleSeg/data/val_list.txt
num_classes: 2
mode: val
transforms:
- type: Normalize

optimizer:
type: sgd
weight_decay: 0.0005

loss:
types:
- type: OhemCrossEntropyLoss
- type: OhemCrossEntropyLoss
- type: OhemCrossEntropyLoss
coef: [1, 1, 1]

lr_scheduler:
type: PolynomialDecay
learning_rate: 0.01
end_lr: 0.0
power: 0.9

运行环境：
aistudio
paddlepaddle==2.3.3
paddleseg==2.7.0
python3

Bug描述确认 Bug description confirmation

我确认已经提供了Bug复现步骤、代码改动说明、以及环境信息，确认问题是可以复现的。I confirm that the bug replication steps, code change instructions, and environment information have been provided, and the problem can be reproduced.

是否愿意提交PR？ Are you willing to submit a PR?

我愿意提交PR！I'd like to help by submitting a PR!

The text was updated successfully, but these errors were encountered:

Asthestarsfalll · 2023-07-24T06:42:20Z

看起来是gt和原图大小不一致，请检查一遍数据集

ToddBear · 2023-08-07T06:55:22Z

以上回答已经充分解答了问题，如果有新的问题欢迎随时提交issue，或者在此条issue下继续回复～
我们开启了飞桨套件的ISSUE攻关活动，欢迎感兴趣的开发者参加：PaddlePaddle/PaddleOCR#10223

loxoo6 added the bug Something isn't working label Jul 24, 2023

ToddBear assigned Asthestarsfalll Jul 25, 2023

ToddBear mentioned this issue Jul 25, 2023

CV套件建设专项活动 #3333

Closed

ToddBear closed this as completed Aug 7, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

用官方的BiseNetV1配置文件运行自己的数据集报错 #3401

用官方的BiseNetV1配置文件运行自己的数据集报错 #3401

loxoo6 commented Jul 24, 2023

Asthestarsfalll commented Jul 24, 2023

ToddBear commented Aug 7, 2023

用官方的BiseNetV1配置文件运行自己的数据集报错 #3401

用官方的BiseNetV1配置文件运行自己的数据集报错 #3401

Comments

loxoo6 commented Jul 24, 2023

问题确认 Search before asking

Bug描述 Describe the Bug

C++ Traceback (most recent call last):

Error Message Summary:

复现环境 Environment

Bug描述确认 Bug description confirmation

是否愿意提交PR？ Are you willing to submit a PR?

Asthestarsfalll commented Jul 24, 2023

ToddBear commented Aug 7, 2023