Synchronized Batch normalization in multi GPUs training #8458

cccorn · 2017-10-28T06:05:09Z

I found that Batch Norm dose not synchronize the mean and variance through multi-GPUs in training phase.But I want to train ResNet using big image, so i can only use a small batch in each device. I think it is not appropriate to use Batch Norm when the batch size is too small (5 images).
Is there any way to easily solve this problem?

Yezipiaomu · 2017-10-29T03:01:29Z

Also need the Synchronized Batch normalization!!!!1

Yezipiaomu · 2017-10-29T06:49:20Z

@tqchen @pluskid @mli

Jerryzcn · 2017-10-30T05:56:58Z

Actually all major framework does not synchronize mean and variance, since this will significantly slow down the training, if the cross bandwidth is low. That being said, i think this is a very useful feature. For semantic segmentation, especially reproducing PSPNet, synchronized batch norm is necessary.

Yezipiaomu · 2017-10-30T06:11:28Z

So, How to implement it ?! @Jerryzcn it seem that it's import for segmentation

Jerryzcn · 2017-10-30T06:36:02Z

@Tommymhz I would suggest implement your own batchnorm in gluon, and sync across all gpus.

Yezipiaomu · 2017-10-30T07:41:36Z

However, My code are based on the symbol !!

zdwong · 2017-12-15T06:54:15Z

I need the Synchronized Batch normalization, too. My code is based on symbol. Any Suggestion?

KeyKy · 2017-12-15T09:15:00Z

Could you share your code?

Jerryzcn · 2017-12-15T23:44:31Z

@zhanghang1989

zhanghang1989 · 2017-12-16T00:54:58Z

There is no easy way to solve Sync BN. I have implemented it using PyTorch (https://hangzh.com/PyTorch-Encoding/notes/syncbn.html) by rewriting DataParallel and cross-GPU communication. I will implement MXNet Gluon version as well, probably release it early next year.

John1231983 · 2018-01-31T13:41:02Z

@zhanghang1989 : how about tensorflow. Could you also implement it? We are really need it in tensorflow

marcoabreu · 2018-01-31T15:55:45Z

@John1231983 This repository is for MXNet and not tensor flow. Please refrain from requests regarding other frameworks.

oronanschel · 2018-03-19T21:57:42Z

Hi, did someone implemented that yet?

zhanghang1989 · 2018-03-30T02:46:22Z

Implemented in PR #11502

cccorn · 2018-03-30T06:59:50Z

@zhanghang1989 Thanks!

thomelane · 2018-07-12T18:06:31Z

@sandeep-krishnamurthy this issue can be closed now, thanks. further discussions should be on the PR #11502

pengwangucla · 2019-04-17T03:21:43Z

There is no easy way to solve Sync BN. I have implemented it using PyTorch (https://hangzh.com/PyTorch-Encoding/notes/syncbn.html) by rewriting DataParallel and cross-GPU communication. I will implement MXNet Gluon version as well, probably release it early next year.

HI Hang, I used your sync_bn implementation for mxnet symbol. However, it reduced the performance of my network. I wonder whether you have ever tried with symbol API with your sync_bn other than gluon. Thanks

zhanghang1989 · 2019-04-17T03:26:25Z

The SyncBN does reduce the speed due to synchronization for every BN layer during both forward and backward. Use it only when it is necessary, especially on computation intensive task such as semantic segmentation or object detection where the synchronization is not a huge bottleneck comparing to the computation.

pengwangucla · 2019-04-21T16:59:53Z

The SyncBN does reduce the speed due to synchronization for every BN layer during both forward and backward. Use it only when it is necessary, especially on computation intensive task such as semantic segmentation or object detection where the synchronization is not a huge bottleneck comparing to the computation.

Thanks for the reply, I understand it reduces the speed as I observed, but it also reduces the accuracy from a pretrained se-net. I wonder whether you have tried it using symbol API ? Thanks

eric-haibin-lin added the Question label Oct 28, 2017

cccorn closed this as completed Jul 22, 2018

zhanghang1989 mentioned this issue Apr 17, 2019

[MXNET-614] Adding Synchronized Batch Normalization #11502

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Synchronized Batch normalization in multi GPUs training #8458

Synchronized Batch normalization in multi GPUs training #8458

cccorn commented Oct 28, 2017

Yezipiaomu commented Oct 29, 2017

Yezipiaomu commented Oct 29, 2017

Jerryzcn commented Oct 30, 2017 •

edited

Loading

Yezipiaomu commented Oct 30, 2017

Jerryzcn commented Oct 30, 2017

Yezipiaomu commented Oct 30, 2017

zdwong commented Dec 15, 2017

KeyKy commented Dec 15, 2017

Jerryzcn commented Dec 15, 2017

zhanghang1989 commented Dec 16, 2017

John1231983 commented Jan 31, 2018

marcoabreu commented Jan 31, 2018

oronanschel commented Mar 19, 2018

zhanghang1989 commented Mar 30, 2018 •

edited

Loading

cccorn commented Mar 30, 2018

thomelane commented Jul 12, 2018

pengwangucla commented Apr 17, 2019

zhanghang1989 commented Apr 17, 2019

pengwangucla commented Apr 21, 2019 •

edited

Loading

Synchronized Batch normalization in multi GPUs training #8458

Synchronized Batch normalization in multi GPUs training #8458

Comments

cccorn commented Oct 28, 2017

Yezipiaomu commented Oct 29, 2017

Yezipiaomu commented Oct 29, 2017

Jerryzcn commented Oct 30, 2017 • edited Loading

Yezipiaomu commented Oct 30, 2017

Jerryzcn commented Oct 30, 2017

Yezipiaomu commented Oct 30, 2017

zdwong commented Dec 15, 2017

KeyKy commented Dec 15, 2017

Jerryzcn commented Dec 15, 2017

zhanghang1989 commented Dec 16, 2017

John1231983 commented Jan 31, 2018

marcoabreu commented Jan 31, 2018

oronanschel commented Mar 19, 2018

zhanghang1989 commented Mar 30, 2018 • edited Loading

cccorn commented Mar 30, 2018

thomelane commented Jul 12, 2018

pengwangucla commented Apr 17, 2019

zhanghang1989 commented Apr 17, 2019

pengwangucla commented Apr 21, 2019 • edited Loading

Jerryzcn commented Oct 30, 2017 •

edited

Loading

zhanghang1989 commented Mar 30, 2018 •

edited

Loading

pengwangucla commented Apr 21, 2019 •

edited

Loading