-
Notifications
You must be signed in to change notification settings - Fork 6.8k
NAG Optimizer with multi-precision support #14568
Conversation
Still need to add proper doc strings for the update functions. The PR can be reviewed, cpplint has failed which i will fix |
fb7ba2a
to
041262b
Compare
@mxnet-label-bot add [Optimizer, pr-awaiting-review] |
Cool :-) - I actually wanted to do this after finishing AMP. Since you used the same lgeneral layoutlike I did for SGD - do you think it would be beneficial to generalize it a little bit (so that adding more optimizer like this is easier in the future)? |
@ptrendx |
@anirudhacharya very cool! Would be great if you can comment a bit about how/when is NAG delivering better results compared to other optimizers. |
@anirudhacharya @lupesko it's a great feature which is also very useful for CPU BF16 :) Feel free to ping me if anything needs our team to cover. |
@anirudhacharya Can you look into the CI failures on this one. ? |
f8afbf3
to
9d02dd3
Compare
@lupesko As we know SGD with Momentum helps to accelerate the optimizers descent to the desired minima by reducing the oscillations in the parameter updates by adding a fraction of the update of the past time step to the update of the current time step. This enables faster convergence of the model. But this can also cause the optimizer to overshoot the global minima due to larger parameter updates. NAG optimizer fixes this by changing the update rules to decelerate the optimizer as it nears the global minima. This has shown better performance while training RNNs as described here - https://arxiv.org/abs/1212.0901. The following diagram will illustrate the difference -
This PR also adds multi-precision support to the NAG optimizer, which is very useful while training in fp16 because multi-precision optimizers keep a copy of the weights in fp32 but performs backward pass and parameter updates in fp16. This technique prevents any loss in accuracy of the model while giving us significant benefits in improved memory and time taken for training. For more details on mixed precision training, please see here - https://devblogs.nvidia.com/mixed-precision-training-deep-neural-networks/ |
56f0e19
to
af9d8df
Compare
5fdd6f4
to
3d32e96
Compare
@mxnet-label-bot update [pr-awaiting-merge] |
@nswamy @sandeep-krishnamurthy @anirudh2290 - Please review and merge |
* nag_mp * doc * reuse sgd updates where convenient
Description
NAG Optimizer with multi-precision support. Tests already exist for this.
Checklist
Essentials
Please feel free to remove inapplicable items for your PR.
Changes
@eric-haibin-lin @ptrendx