Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

Synchronized Batch normalization in multi GPUs training #8458

Closed
cccorn opened this issue Oct 28, 2017 · 19 comments
Closed

Synchronized Batch normalization in multi GPUs training #8458

cccorn opened this issue Oct 28, 2017 · 19 comments

Comments

@cccorn
Copy link

cccorn commented Oct 28, 2017

I found that Batch Norm dose not synchronize the mean and variance through multi-GPUs in training phase.But I want to train ResNet using big image, so i can only use a small batch in each device. I think it is not appropriate to use Batch Norm when the batch size is too small (5 images).
Is there any way to easily solve this problem?

@Yezipiaomu
Copy link

Also need the Synchronized Batch normalization!!!!1

@Yezipiaomu
Copy link

@tqchen @pluskid @mli

@Jerryzcn
Copy link
Contributor

Jerryzcn commented Oct 30, 2017

Actually all major framework does not synchronize mean and variance, since this will significantly slow down the training, if the cross bandwidth is low. That being said, i think this is a very useful feature. For semantic segmentation, especially reproducing PSPNet, synchronized batch norm is necessary.

@Yezipiaomu
Copy link

So, How to implement it ?! @Jerryzcn it seem that it's import for segmentation

@Jerryzcn
Copy link
Contributor

@Tommymhz I would suggest implement your own batchnorm in gluon, and sync across all gpus.

@Yezipiaomu
Copy link

However, My code are based on the symbol !!

@zdwong
Copy link

zdwong commented Dec 15, 2017

I need the Synchronized Batch normalization, too. My code is based on symbol. Any Suggestion?

@KeyKy
Copy link
Contributor

KeyKy commented Dec 15, 2017

Could you share your code?

@Jerryzcn
Copy link
Contributor

@zhanghang1989

@zhanghang1989
Copy link
Contributor

There is no easy way to solve Sync BN. I have implemented it using PyTorch (https://hangzh.com/PyTorch-Encoding/notes/syncbn.html) by rewriting DataParallel and cross-GPU communication. I will implement MXNet Gluon version as well, probably release it early next year.

@John1231983
Copy link

@zhanghang1989 : how about tensorflow. Could you also implement it? We are really need it in tensorflow

@marcoabreu
Copy link
Contributor

@John1231983 This repository is for MXNet and not tensor flow. Please refrain from requests regarding other frameworks.

@oronanschel
Copy link

Hi, did someone implemented that yet?

@zhanghang1989
Copy link
Contributor

zhanghang1989 commented Mar 30, 2018

Implemented in PR #11502

@cccorn
Copy link
Author

cccorn commented Mar 30, 2018

@zhanghang1989 Thanks!

@thomelane
Copy link
Contributor

@sandeep-krishnamurthy this issue can be closed now, thanks. further discussions should be on the PR #11502

@cccorn cccorn closed this as completed Jul 22, 2018
@pengwangucla
Copy link

There is no easy way to solve Sync BN. I have implemented it using PyTorch (https://hangzh.com/PyTorch-Encoding/notes/syncbn.html) by rewriting DataParallel and cross-GPU communication. I will implement MXNet Gluon version as well, probably release it early next year.

HI Hang, I used your sync_bn implementation for mxnet symbol. However, it reduced the performance of my network. I wonder whether you have ever tried with symbol API with your sync_bn other than gluon. Thanks

@zhanghang1989
Copy link
Contributor

The SyncBN does reduce the speed due to synchronization for every BN layer during both forward and backward. Use it only when it is necessary, especially on computation intensive task such as semantic segmentation or object detection where the synchronization is not a huge bottleneck comparing to the computation.

@pengwangucla
Copy link

pengwangucla commented Apr 21, 2019

The SyncBN does reduce the speed due to synchronization for every BN layer during both forward and backward. Use it only when it is necessary, especially on computation intensive task such as semantic segmentation or object detection where the synchronization is not a huge bottleneck comparing to the computation.

Thanks for the reply, I understand it reduces the speed as I observed, but it also reduces the accuracy from a pretrained se-net. I wonder whether you have tried it using symbol API ? Thanks

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests