Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

多gpu训练效果下降问题 Effect decline problem about multi-gpu training #1634

Open
zbzbzbb95 opened this issue Sep 6, 2022 · 1 comment
Assignees
Labels
kind/bug something isn't working

Comments

@zbzbzbb95
Copy link

zbzbzbb95 commented Sep 6, 2022

您好!
我在使用tools/dist_train.sh训练voxelpose时,发现单卡、双卡和8卡DDP训练存在显卡数越多效果越差的现象,并且尝试是否打开选项autoscale_lr,结果打开autoscale_lr效果更差 :(
config方面采用lr=0.0001(与voxelpose一致),batch_size=8
1.请问是否有其他地方我忽略了导致这个问题出现?
2.经过查阅资料发现,各卡之间独立计算BN可能是多卡精度下降的原因之一,请问我们mmpose中的分布式训练时同步BN需要手动设置还是自动进行?
【针对2 更新】尝试在build_norm_layer时选择SyncBN代替BN3d,精度有所恢复,但仍不及单卡效果

Hello!
When I used tools/dist_train.sh to train voxelpose, I found that single-gpu, dual-gpu and 8-gpu DDP training has a phenomenon that the more gpus I use, the worse the effect is. Then I tried to turn on the option autoscale_lr, but the result is worse :(
In config, lr=0.0001 (consistent with voxelpose), batch_size=8

  1. Is there something else I overlooked that caused this problem?
  2. After reviewing some blogs, it is found that SyncBatchNorm may be one of the reasons for the decline of multi-gpu accuracy. May I ask whether we need to manually set or automatically perform BN synchronization during distributed training in mmpose?
    [For question 2 Update] I tried to select SyncBN instead of BN3d when building_norm_layer, the accuracy has recovered, but it is still not as good as single gpu effect
@mm-assistant
Copy link

mm-assistant bot commented Sep 6, 2022

We recommend using English or English & Chinese for issues so that we could have broader discussion.

@zbzbzbb95 zbzbzbb95 changed the title 多gpu训练效果下降问题 多gpu训练效果下降问题 Effect decline problem about multi-gpu training Sep 6, 2022
@jin-s13 jin-s13 added the kind/bug something isn't working label Sep 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants