多gpu训练效果下降问题 Effect decline problem about multi-gpu training #1634

zbzbzbb95 · 2022-09-06T01:55:43Z

您好！
我在使用tools/dist_train.sh训练voxelpose时，发现单卡、双卡和8卡DDP训练存在显卡数越多效果越差的现象，并且尝试是否打开选项autoscale_lr，结果打开autoscale_lr效果更差 :(
config方面采用lr=0.0001（与voxelpose一致），batch_size=8
1.请问是否有其他地方我忽略了导致这个问题出现？
2.经过查阅资料发现，各卡之间独立计算BN可能是多卡精度下降的原因之一，请问我们mmpose中的分布式训练时同步BN需要手动设置还是自动进行？
【针对2 更新】尝试在build_norm_layer时选择SyncBN代替BN3d，精度有所恢复，但仍不及单卡效果

Hello!
When I used tools/dist_train.sh to train voxelpose, I found that single-gpu, dual-gpu and 8-gpu DDP training has a phenomenon that the more gpus I use, the worse the effect is. Then I tried to turn on the option autoscale_lr, but the result is worse :(
In config, lr=0.0001 (consistent with voxelpose), batch_size=8

Is there something else I overlooked that caused this problem?
After reviewing some blogs, it is found that SyncBatchNorm may be one of the reasons for the decline of multi-gpu accuracy. May I ask whether we need to manually set or automatically perform BN synchronization during distributed training in mmpose?
[For question 2 Update] I tried to select SyncBN instead of BN3d when building_norm_layer, the accuracy has recovered, but it is still not as good as single gpu effect

mm-assistant · 2022-09-06T01:55:46Z

We recommend using English or English & Chinese for issues so that we could have broader discussion.

zbzbzbb95 changed the title ~~多gpu训练效果下降问题~~ 多gpu训练效果下降问题 Effect decline problem about multi-gpu training Sep 6, 2022

ly015 assigned wusize Sep 6, 2022

jin-s13 added the kind/bug something isn't working label Sep 6, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

多gpu训练效果下降问题 Effect decline problem about multi-gpu training #1634

多gpu训练效果下降问题 Effect decline problem about multi-gpu training #1634

zbzbzbb95 commented Sep 6, 2022 •

edited

Loading

mm-assistant bot commented Sep 6, 2022

多gpu训练效果下降问题 Effect decline problem about multi-gpu training #1634

多gpu训练效果下降问题 Effect decline problem about multi-gpu training #1634

Comments

zbzbzbb95 commented Sep 6, 2022 • edited Loading

mm-assistant bot commented Sep 6, 2022

zbzbzbb95 commented Sep 6, 2022 •

edited

Loading