表格识别训练时候，进行eval 时候遇到bug #13268

liuzhipengchd · 2024-07-05T03:06:33Z

PaddleOCR/ppocr/modeling/heads/table_att_head.py

Line 370 in 7a3c580

structure_ids = paddle.zeros(

[Hint: Expected data_type == phi::DataType::FLOAT16 || data_type == phi::DataType::BFLOAT16 == true, but received data_type == phi::DataType::FLOAT16 || data_type == phi::DataType::BFLOAT16:0 != true:1.] (at ../paddle/fluid/imperative/amp_auto_cast.cc:190)

在评估时候，structure_ids 的数据类型和pre_chars 不一致，导致bug

GreatV · 2024-07-05T03:18:00Z

运行环境是什么 paddle版本是多少 paddleocr 版本是多少

liuzhipengchd · 2024-07-05T03:28:06Z

运行环境是什么 paddle版本是多少 paddleocr 版本是多少

paddleocr 是2.8 ，paddlepaddle-gpu == 0.0.0.post112

GreatV · 2024-07-05T03:45:30Z

用的是啥gpu呀，amp模式训练的吗，不用amp可以运行吗

liuzhipengchd · 2024-07-05T03:49:48Z

用的是啥gpu呀，amp模式训练的吗，不用amp可以运行吗

4090，就是正常模式。把那个数据格式改成统一的int32就可以执行。。不是这个bug吗？

GreatV · 2024-07-05T04:16:20Z

好的 @liuzhipengchd ，谢谢反馈，你能提一个PR来修复它吗？

GreatV · 2024-07-05T04:19:13Z

4090，就是正常模式。把那个数据格式改成统一的int32就可以执行。。不是这个bug吗？

因为这里看上去是启用amp，导致的bug。

liuzhipengchd · 2024-07-05T05:46:32Z

4090，就是正常模式。把那个数据格式改成统一的int32就可以执行。。不是这个bug吗？

因为这里看上去是启用amp，导致的bug。

好的。。大佬，我想问下。1、训练表格识别模型，那个效果好 SLANet_lcnetv2.yml 和 SLANet_ch.yml （如果我选择scale 2.5的呢），2、SLANet_lcnetv2 中我想选择 PPLCNetV2_large的话，应该怎么配置

GreatV · 2024-07-05T06:09:26Z

我想问下。1、训练表格识别模型，那个效果好 SLANet_lcnetv2.yml 和 SLANet_ch.yml （如果我选择scale 2.5的呢），2、SLANet_lcnetv2 中我想选择 PPLCNetV2_large的话，应该怎么配置

应该是 SLANet_lcnetv2.yml 更好一点，具体的得请教一下 @invictuszhao

GreatV · 2024-07-06T07:50:33Z

@liuzhipengchd 无法复现你的问题

 python3 tools/train.py -c configs/table/SLANet.yml -o Train.loader.batch_size_per_card=16 Eval.loader.batch_size_per_card=16

[2024/07/06 07:43:04] ppocr INFO: epoch: [1/100], global_step: 1000, lr: 0.001000, acc: 0.000000, loss: 0.083020, structure_loss: 0.055820, loc_loss: 0.026588, avg_reader_cost: 0.00075 s, avg_batch_cost: 0.28300 s, avg_samples: 16.0, ips: 56.53801 samples/s, eta: 10 days, 9:10:27, max_mem_reserved: 8722 MB, max_mem_allocated: 8043 MB
eval model:: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 570/570 [01:45<00:00,  5.41it/s]
[2024/07/06 07:44:49] ppocr INFO: cur metric, acc: 0.026001097089851773, fps: 94.59463124211499
[2024/07/06 07:44:49] ppocr INFO: save best model is to ./output/SLANet/best_accuracy
[2024/07/06 07:44:49] ppocr INFO: best metric, acc: 0.026001097089851773, is_float16: False, fps: 94.59463124211499, best_epoch: 1
[2024/07/06 07:44:55] ppocr INFO: epoch: [1/100], global_step: 1020, lr: 0.001000, acc: 0.000000, loss: 0.078646, structure_loss: 0.050355, loc_loss: 0.032018, avg_reader_cost: 0.00104 s, avg_batch_cost: 0.27847 s, avg_samples: 16.0, ips: 57.45583 samples/s, eta: 10 days, 9:01:56, max_mem_reserved: 8722 MB, max_mem_allocated: 8043 MB

python3 tools/train.py -c configs/table/SLANet.yml -o Train.loader.batch_size_per_card=16 Eval.loader.batch_size_per_card=16 Global.use_amp=True

[2024/07/06 07:59:55] ppocr INFO: epoch: [1/100], global_step: 1000, lr: 0.001000, acc: 0.000000, loss: 0.084489, structure_loss: 0.057010, loc_loss: 0.028551, avg_reader_cost: 0.00082 s, avg_batch_cost: 0.26592 s, avg_samples: 16.0, ips: 60.16862 samples/s, eta: 9 days, 18:05:22, max_mem_reserved: 4516 MB, max_mem_allocated: 4079 MB
eval model:: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 570/570 [01:59<00:00,  4.78it/s]
[2024/07/06 08:01:55] ppocr INFO: cur metric, acc: 0.005485463521065774, fps: 82.6024714859491
[2024/07/06 08:01:55] ppocr INFO: save best model is to ./output/SLANet/best_accuracy
[2024/07/06 08:01:55] ppocr INFO: best metric, acc: 0.005485463521065774, is_float16: False, fps: 82.6024714859491, best_epoch: 1
[2024/07/06 08:02:00] ppocr INFO: epoch: [1/100], global_step: 1020, lr: 0.001000, acc: 0.000000, loss: 0.078040, structure_loss: 0.051105, loc_loss: 0.030460, avg_reader_cost: 0.00108 s, avg_batch_cost: 0.26121 s, avg_samples: 16.0, ips: 61.25244 samples/s, eta: 9 days, 17:56:58, max_mem_reserved: 4691 MB, max_mem_allocated: 4079 MB

liuzhipengchd · 2024-07-08T00:59:39Z

@liuzhipengchd 无法复现你的问题

 python3 tools/train.py -c configs/table/SLANet.yml -o Train.loader.batch_size_per_card=16 Eval.loader.batch_size_per_card=16

[2024/07/06 07:43:04] ppocr INFO: epoch: [1/100], global_step: 1000, lr: 0.001000, acc: 0.000000, loss: 0.083020, structure_loss: 0.055820, loc_loss: 0.026588, avg_reader_cost: 0.00075 s, avg_batch_cost: 0.28300 s, avg_samples: 16.0, ips: 56.53801 samples/s, eta: 10 days, 9:10:27, max_mem_reserved: 8722 MB, max_mem_allocated: 8043 MB
eval model:: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 570/570 [01:45<00:00,  5.41it/s]
[2024/07/06 07:44:49] ppocr INFO: cur metric, acc: 0.026001097089851773, fps: 94.59463124211499
[2024/07/06 07:44:49] ppocr INFO: save best model is to ./output/SLANet/best_accuracy
[2024/07/06 07:44:49] ppocr INFO: best metric, acc: 0.026001097089851773, is_float16: False, fps: 94.59463124211499, best_epoch: 1
[2024/07/06 07:44:55] ppocr INFO: epoch: [1/100], global_step: 1020, lr: 0.001000, acc: 0.000000, loss: 0.078646, structure_loss: 0.050355, loc_loss: 0.032018, avg_reader_cost: 0.00104 s, avg_batch_cost: 0.27847 s, avg_samples: 16.0, ips: 57.45583 samples/s, eta: 10 days, 9:01:56, max_mem_reserved: 8722 MB, max_mem_allocated: 8043 MB

python3 tools/train.py -c configs/table/SLANet.yml -o Train.loader.batch_size_per_card=16 Eval.loader.batch_size_per_card=16 Global.use_amp=True

[2024/07/06 07:59:55] ppocr INFO: epoch: [1/100], global_step: 1000, lr: 0.001000, acc: 0.000000, loss: 0.084489, structure_loss: 0.057010, loc_loss: 0.028551, avg_reader_cost: 0.00082 s, avg_batch_cost: 0.26592 s, avg_samples: 16.0, ips: 60.16862 samples/s, eta: 9 days, 18:05:22, max_mem_reserved: 4516 MB, max_mem_allocated: 4079 MB
eval model:: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 570/570 [01:59<00:00,  4.78it/s]
[2024/07/06 08:01:55] ppocr INFO: cur metric, acc: 0.005485463521065774, fps: 82.6024714859491
[2024/07/06 08:01:55] ppocr INFO: save best model is to ./output/SLANet/best_accuracy
[2024/07/06 08:01:55] ppocr INFO: best metric, acc: 0.005485463521065774, is_float16: False, fps: 82.6024714859491, best_epoch: 1
[2024/07/06 08:02:00] ppocr INFO: epoch: [1/100], global_step: 1020, lr: 0.001000, acc: 0.000000, loss: 0.078040, structure_loss: 0.051105, loc_loss: 0.030460, avg_reader_cost: 0.00108 s, avg_batch_cost: 0.26121 s, avg_samples: 16.0, ips: 61.25244 samples/s, eta: 9 days, 17:56:58, max_mem_reserved: 4691 MB, max_mem_allocated: 4079 MB

我这里可以复现问题
保持类型一样就可以了。（我把类型保持一样，会影响训练效果吗）

GreatV · 2024-07-08T01:56:09Z

@liuzhipengchd 不会影响效果，这个已经修复了。

unifying data types in the SLAHead #13276

GreatV added bug Something isn't working Code PR is needed This issue could inspire a code PR labels Jul 5, 2024

GreatV mentioned this issue Jul 6, 2024

unifying data types in the SLAHead #13276

Merged

GreatV linked a pull request Jul 6, 2024 that will close this issue

unifying data types in the SLAHead #13276

Merged

GreatV closed this as completed in #13276 Jul 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

表格识别训练时候，进行eval 时候遇到bug #13268

表格识别训练时候，进行eval 时候遇到bug #13268

liuzhipengchd commented Jul 5, 2024

GreatV commented Jul 5, 2024

liuzhipengchd commented Jul 5, 2024

GreatV commented Jul 5, 2024

liuzhipengchd commented Jul 5, 2024

GreatV commented Jul 5, 2024

GreatV commented Jul 5, 2024

liuzhipengchd commented Jul 5, 2024

GreatV commented Jul 5, 2024

GreatV commented Jul 6, 2024 •

edited

Loading

liuzhipengchd commented Jul 8, 2024

GreatV commented Jul 8, 2024

表格识别 训练时候，进行eval 时候遇到bug #13268

表格识别 训练时候，进行eval 时候遇到bug #13268

Comments

liuzhipengchd commented Jul 5, 2024

GreatV commented Jul 5, 2024

liuzhipengchd commented Jul 5, 2024

GreatV commented Jul 5, 2024

liuzhipengchd commented Jul 5, 2024

GreatV commented Jul 5, 2024

GreatV commented Jul 5, 2024

liuzhipengchd commented Jul 5, 2024

GreatV commented Jul 5, 2024

GreatV commented Jul 6, 2024 • edited Loading

liuzhipengchd commented Jul 8, 2024

GreatV commented Jul 8, 2024

表格识别训练时候，进行eval 时候遇到bug #13268

表格识别训练时候，进行eval 时候遇到bug #13268

GreatV commented Jul 6, 2024 •

edited

Loading