-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Synchronized Batch normalization in multi GPUs training #8458
Comments
Also need the Synchronized Batch normalization!!!!1 |
Actually all major framework does not synchronize mean and variance, since this will significantly slow down the training, if the cross bandwidth is low. That being said, i think this is a very useful feature. For semantic segmentation, especially reproducing PSPNet, synchronized batch norm is necessary. |
So, How to implement it ?! @Jerryzcn it seem that it's import for segmentation |
@Tommymhz I would suggest implement your own batchnorm in gluon, and sync across all gpus. |
However, My code are based on the symbol !! |
I need the Synchronized Batch normalization, too. My code is based on symbol. Any Suggestion? |
Could you share your code? |
There is no easy way to solve Sync BN. I have implemented it using PyTorch (https://hangzh.com/PyTorch-Encoding/notes/syncbn.html) by rewriting DataParallel and cross-GPU communication. I will implement MXNet Gluon version as well, probably release it early next year. |
@zhanghang1989 : how about tensorflow. Could you also implement it? We are really need it in tensorflow |
@John1231983 This repository is for MXNet and not tensor flow. Please refrain from requests regarding other frameworks. |
Hi, did someone implemented that yet? |
Implemented in PR #11502 |
@zhanghang1989 Thanks! |
@sandeep-krishnamurthy this issue can be closed now, thanks. further discussions should be on the PR #11502 |
HI Hang, I used your sync_bn implementation for mxnet symbol. However, it reduced the performance of my network. I wonder whether you have ever tried with symbol API with your sync_bn other than gluon. Thanks |
The SyncBN does reduce the speed due to synchronization for every BN layer during both forward and backward. Use it only when it is necessary, especially on computation intensive task such as semantic segmentation or object detection where the synchronization is not a huge bottleneck comparing to the computation. |
Thanks for the reply, I understand it reduces the speed as I observed, but it also reduces the accuracy from a pretrained se-net. I wonder whether you have tried it using symbol API ? Thanks |
I found that Batch Norm dose not synchronize the mean and variance through multi-GPUs in training phase.But I want to train ResNet using big image, so i can only use a small batch in each device. I think it is not appropriate to use Batch Norm when the batch size is too small (5 images).
Is there any way to easily solve this problem?
The text was updated successfully, but these errors were encountered: