Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

NAG Optimizer with multi-precision support #14568

Merged
merged 3 commits into from
May 30, 2019

Conversation

anirudhacharya
Copy link
Member

@anirudhacharya anirudhacharya commented Mar 29, 2019

Description

NAG Optimizer with multi-precision support. Tests already exist for this.

Checklist

Essentials

Please feel free to remove inapplicable items for your PR.

  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage:
  • Unit tests are added for small changes to verify correctness (e.g. adding a new operator)
  • Nightly tests are added for complicated/long-running ones (e.g. changing distributed kvstore)
  • Build tests will be added for build configuration changes (e.g. adding a new build option with NCCL)
  • Code is well-documented:
  • For user-facing API changes, API doc string has been updated.
  • For new C++ functions in header files, their functionalities and arguments are documented.
  • For new examples, README.md is added to explain the what the example does, the source of the dataset, expected performance on test set and reference to the original paper if applicable
  • Check the API doc at https://mxnet-ci-doc.s3-accelerate.dualstack.amazonaws.com/PR-$PR_ID/$BUILD_ID/index.html
  • To the my best knowledge, examples are either not affected by this change, or have been fixed to be compatible with this change

Changes

  • C++ implementation of nag opt with multi-precision support.

@eric-haibin-lin @ptrendx

@anirudhacharya
Copy link
Member Author

anirudhacharya commented Mar 29, 2019

Still need to add proper doc strings for the update functions. The PR can be reviewed, cpplint has failed which i will fix

@abhinavs95
Copy link
Contributor

@mxnet-label-bot add [Optimizer, pr-awaiting-review]

@marcoabreu marcoabreu added Optimizer pr-awaiting-review PR is waiting for code review labels Mar 29, 2019
@ptrendx
Copy link
Member

ptrendx commented Mar 29, 2019

Cool :-) - I actually wanted to do this after finishing AMP. Since you used the same lgeneral layoutlike I did for SGD - do you think it would be beneficial to generalize it a little bit (so that adding more optimizer like this is easier in the future)?

@anirudhacharya
Copy link
Member Author

@ptrendx MP_NAG_InferType can be generalized right away. I will see what else can be generalized. do you have any specific suggestions?

@lupesko
Copy link
Contributor

lupesko commented Mar 29, 2019

@anirudhacharya very cool! Would be great if you can comment a bit about how/when is NAG delivering better results compared to other optimizers.

@pengzhao-intel
Copy link
Contributor

@anirudhacharya @lupesko it's a great feature which is also very useful for CPU BF16 :)

Feel free to ping me if anything needs our team to cover.
@ZhennanQin @TaoLv

@piyushghai
Copy link
Contributor

@anirudhacharya Can you look into the CI failures on this one. ?

@anirudhacharya
Copy link
Member Author

@lupesko
Nesterov Accelerated Gradient or NAG is an improvement over SGD with Momentum optimizer.

As we know SGD with Momentum helps to accelerate the optimizers descent to the desired minima by reducing the oscillations in the parameter updates by adding a fraction of the update of the past time step to the update of the current time step. This enables faster convergence of the model.

But this can also cause the optimizer to overshoot the global minima due to larger parameter updates. NAG optimizer fixes this by changing the update rules to decelerate the optimizer as it nears the global minima. This has shown better performance while training RNNs as described here - https://arxiv.org/abs/1212.0901. The following diagram will illustrate the difference -


(image source is a stack exchange thread)

This PR also adds multi-precision support to the NAG optimizer, which is very useful while training in fp16 because multi-precision optimizers keep a copy of the weights in fp32 but performs backward pass and parameter updates in fp16. This technique prevents any loss in accuracy of the model while giving us significant benefits in improved memory and time taken for training. For more details on mixed precision training, please see here - https://devblogs.nvidia.com/mixed-precision-training-deep-neural-networks/

src/operator/optimizer_op.cc Outdated Show resolved Hide resolved
src/operator/optimizer_op.cc Outdated Show resolved Hide resolved
src/operator/optimizer_op.cc Outdated Show resolved Hide resolved
src/operator/optimizer_op.cc Outdated Show resolved Hide resolved
@anirudhacharya
Copy link
Member Author

anirudhacharya commented May 6, 2019

@mxnet-label-bot update [pr-awaiting-merge]

@marcoabreu marcoabreu removed the pr-awaiting-review PR is waiting for code review label May 6, 2019
@marcoabreu marcoabreu added pr-awaiting-merge Review and CI is complete. Ready to Merge and removed Optimizer labels May 6, 2019
@pinaraws
Copy link

@nswamy @sandeep-krishnamurthy @anirudh2290 - Please review and merge

@anirudh2290 anirudh2290 merged commit 50495d7 into apache:master May 30, 2019
@anirudhacharya anirudhacharya deleted the nag_mp branch May 30, 2019 22:17
haohuanw pushed a commit to haohuanw/incubator-mxnet that referenced this pull request Jun 23, 2019
* nag_mp

* doc

* reuse sgd updates where convenient
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
pr-awaiting-merge Review and CI is complete. Ready to Merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants