Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

add use_global_stats in nn.BatchNorm #9420

Merged
merged 1 commit into from
Jan 22, 2018
Merged

add use_global_stats in nn.BatchNorm #9420

merged 1 commit into from
Jan 22, 2018

Conversation

tornadomeet
Copy link
Contributor

@tornadomeet tornadomeet commented Jan 14, 2018

Description

#9419

Checklist

Essentials

  • Passed code style checking (make lint)
  • Changes are complete (i.e. I finished coding on this PR)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Copy link
Member

@szha szha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Ping @piiswrong in case there's additional comment.

@piiswrong
Copy link
Contributor

not sure if we want to add this option, especially under this name.
Does it have a reference from a paper or another framework? Is there a more commonly used name?

@tornadomeet
Copy link
Contributor Author

tornadomeet commented Jan 16, 2018

@piiswrong piiswrong merged commit dae6cda into apache:master Jan 22, 2018
@7oud
Copy link

7oud commented Feb 25, 2018

@szha @tornadomeet if training with use_global_stats=True, it seemed all the moving_mean = 0 and moving_var = 1 in the trained model, is is right ? then batch norm changed into a scalar shift op. what situation should use_global_stats=True be used ?

@thbupt
Copy link

thbupt commented Feb 26, 2018

@7oud I have the same question. I think use_global_stats=True should be used as you finetune some pretrained model such as ResNet, VGG.

@tornadomeet
Copy link
Contributor Author

@7oud just as @thbupt said, it used for fine-tune pre-trained model, such as fix some layer which using bn during fine-tune. if with fine-tune, then moving_mean and moving_var will init from pre-trained model, so moving_mean will not be 0, and moving var will not be 1.

@7oud
Copy link

7oud commented Feb 27, 2018

@thbupt @tornadomeet I found in some small dataset tasks such as segmentation (training from scratch), the inference result is worse than training when using BatchNorm without use_global_stats. Did you have similar situations?

@thbupt
Copy link

thbupt commented Feb 27, 2018

@7oud If you train from scratch, the use_global_status should be set to false in training and true in testing which is default in mxnet.

@tornadomeet
Copy link
Contributor Author

tornadomeet commented Feb 27, 2018

@7oud do you mean in your small task, set use_global_stats=True during training will get better result than use_global_stats=False which is default setting?

if that is true, which means bn has no work in your task, so just remove bn for your network.

@7oud
Copy link

7oud commented Feb 27, 2018

@thbupt Actually I did like what you said, but the same data batch has different output when using forward(is_train=False) and forward(is_train=True), it means inference results are worse. So I try to train with use_global_status=True, it gives the same results

@thbupt
Copy link

thbupt commented Feb 27, 2018

@tornadomeet Is there a simple way to set all use_global_status=True as finetuning. I know one way is to set use_global_status=True for each bn layer seperately when adding nn.BatchNorm.

@7oud
Copy link

7oud commented Feb 27, 2018

@tornadomeet it seems that, but I cannot give the conclusion, bcz the dataset is too small to giving truth

@thbupt
Copy link

thbupt commented Feb 27, 2018

@7oud how about your batch size? bn seems to prefer large batch size.

@tornadomeet
Copy link
Contributor Author

@7oud the correct way which using bn during training from scratch is setting use_global_status=False;

just make use_global_status as a parameter for your block class, then you just need changed it once time.

@7oud
Copy link

7oud commented Feb 27, 2018

@thbupt batch size in training is 8, and in inference is usually 1.

@thbupt
Copy link

thbupt commented Feb 27, 2018

@7oud I think 8 is too small for bn, you can try larger bz like 16, 32.

@jonbakerfish
Copy link
Contributor

jonbakerfish commented Jun 3, 2018

In Gluon, do we need to set use_global_stats=True for all the layers when we use the pre-trained model (e.g., resnet) to extract features or inference? If so, how can we do that?

In #3871, it said that the is_train will effect the BatchNorm's behavior. But I can't see any is_train option in https://mxnet.incubator.apache.org/api/python/gluon/model_zoo.html.

@szha
Copy link
Member

szha commented Jun 3, 2018

@jonbakerfish that flag is automatically set by module or autograd.record. It can be queried via autograd.is_training and overridden with autograd.train_mode/predict_mode when using autograd.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants