Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MLU]adapt mlu device for running dbnet network #7835

Merged
merged 1 commit into from
Oct 10, 2022

Conversation

qipengh
Copy link

@qipengh qipengh commented Oct 8, 2022

适配 寒武纪 MLU 芯片,以支持文字检测 dbnet 网络。

# 8卡训练脚本
python3 -m paddle.distributed.launch --mlus '0,1,2,3,4,5,6,7' tools/train.py -c configs/det/det_mv3_db.yml -o Global.pretrained_model=./pretrain_models/MobileNetV3_large_x0_5_pretrained

运行结果如下:
image

@paddle-bot
Copy link

paddle-bot bot commented Oct 8, 2022

Thanks for your contribution!

if paddle.is_compiled_with_cuda():
AMP_RELATED_FLAGS_SETTING.update({
'FLAGS_cudnn_batchnorm_spatial_persistent': 1
})
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

除GPU之外的其他设备,运行AMP模式时,会报错。因为FLAGS_cudnn_batchnorm_spatial_persistent是gpu的参数。这里单独加个判断,is_compiled_with_cuda=true时,会应用该参数。

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个FLAG不是只控制batch norm kernel的行为吗?为什么会报错?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

报错信息如下:

[2022/10/08 12:24:37] ppocr INFO: train dataloader has 8 iters
[2022/10/08 12:24:37] ppocr INFO: valid dataloader has 500 iters
Traceback (most recent call last):
  File "tools/train.py", line 212, in <module>
    main(config, device, logger, vdl_writer)
  File "tools/train.py", line 161, in main
    paddle.fluid.set_flags(AMP_RELATED_FLAGS_SETTING)
  File "/projs/framework/huangqipeng/pdd_a/venv_py38/lib/python3.8/site-packages/paddle/fluid/framework.py", line 7287, in set_flags
    raise ValueError(
ValueError: Flag FLAGS_cudnn_batchnorm_spatial_persistent cannot set its value through this function.
INFO 2022-10-08 12:24:43,989 launch_utils.py:350] terminate all the procs

大概原因是在paddle,paddle/fluid/platform/flags.cc +297这里注册这个Flags时,用PADDLE_WITH_CUDA包起来了,当PADDLE_WITH_MLU时无法使用这个参数。

#if defined(PADDLE_WITH_CUDA) || defined(PADDLE_WITH_HIP)
......
PADDLE_DEFINE_EXPORTED_bool(
    cudnn_batchnorm_spatial_persistent,
    false,
    "Whether enable CUDNN_BATCHNORM_SPATIAL_PERSISTENT mode for cudnn "
    "batch_norm, default is False.");
#endif

Copy link
Contributor

@qili93 qili93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@MissPenguin MissPenguin merged commit a706908 into PaddlePaddle:dygraph Oct 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants