[MLU]adapt mlu device for running dbnet network #7835

qipengh · 2022-10-08T09:00:10Z

适配寒武纪 MLU 芯片，以支持文字检测 dbnet 网络。

# 8卡训练脚本
python3 -m paddle.distributed.launch --mlus '0,1,2,3,4,5,6,7' tools/train.py -c configs/det/det_mv3_db.yml -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained

运行结果如下：

paddle-bot · 2022-10-08T09:00:14Z

Thanks for your contribution!

qipengh · 2022-10-08T09:20:19Z

tools/train.py

+ if paddle.is_compiled_with_cuda():
+ AMP_RELATED_FLAGS_SETTING.update({
+ 'FLAGS_cudnn_batchnorm_spatial_persistent': 1
+ })


除GPU之外的其他设备，运行AMP模式时，会报错。因为FLAGS_cudnn_batchnorm_spatial_persistent是gpu的参数。这里单独加个判断，is_compiled_with_cuda=true时，会应用该参数。

这个FLAG不是只控制batch norm kernel的行为吗？为什么会报错？

报错信息如下：

[2022/10/08 12:24:37] ppocr INFO: train dataloader has 8 iters [2022/10/08 12:24:37] ppocr INFO: valid dataloader has 500 iters Traceback (most recent call last): File "tools/train.py", line 212, in <module> main(config, device, logger, vdl_writer) File "tools/train.py", line 161, in main paddle.fluid.set_flags(AMP_RELATED_FLAGS_SETTING) File "/projs/framework/huangqipeng/pdd_a/venv_py38/lib/python3.8/site-packages/paddle/fluid/framework.py", line 7287, in set_flags raise ValueError( ValueError: Flag FLAGS_cudnn_batchnorm_spatial_persistent cannot set its value through this function. INFO 2022-10-08 12:24:43,989 launch_utils.py:350] terminate all the procs

大概原因是在paddle，paddle/fluid/platform/flags.cc +297这里注册这个Flags时，用PADDLE_WITH_CUDA包起来了，当PADDLE_WITH_MLU时无法使用这个参数。

#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP) ...... PADDLE_DEFINE_EXPORTED_bool( cudnn_batchnorm_spatial_persistent, false, "Whether enable CUDNN_BATCHNORM_SPATIAL_PERSISTENT mode for cudnn " "batch_norm, default is False."); #endif

qili93

LGTM

[MLU]adapt mlu device for running dbnet network

7851977

paddle-bot bot added contributor status: proposed labels Oct 8, 2022

qipengh commented Oct 8, 2022

View reviewed changes

ronny1996 approved these changes Oct 9, 2022

View reviewed changes

qili93 approved these changes Oct 10, 2022

View reviewed changes

MissPenguin approved these changes Oct 10, 2022

View reviewed changes

MissPenguin merged commit a706908 into PaddlePaddle:dygraph Oct 10, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MLU]adapt mlu device for running dbnet network #7835

[MLU]adapt mlu device for running dbnet network #7835

qipengh commented Oct 8, 2022

paddle-bot bot commented Oct 8, 2022

qipengh Oct 8, 2022

ronny1996 Oct 8, 2022

qipengh Oct 8, 2022

qili93 left a comment

[MLU]adapt mlu device for running dbnet network #7835

[MLU]adapt mlu device for running dbnet network #7835

Conversation

qipengh commented Oct 8, 2022

paddle-bot bot commented Oct 8, 2022

qipengh Oct 8, 2022

Choose a reason for hiding this comment

ronny1996 Oct 8, 2022

Choose a reason for hiding this comment

qipengh Oct 8, 2022

Choose a reason for hiding this comment

qili93 left a comment

Choose a reason for hiding this comment