Skip to content
This repository has been archived by the owner on Nov 17, 2023. It is now read-only.

LBSGD documentation fix #13465

Merged
merged 1 commit into from
Dec 5, 2018
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 5 additions & 2 deletions python/mxnet/optimizer/optimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -686,8 +686,11 @@ class LBSGD(Optimizer):
state = momentum * state + lr * rescale_grad * clip(grad, clip_gradient) + wd * weight
weight = weight - state

For details of the update algorithm see :class:`~mxnet.ndarray.lbsgd_update` and
:class:`~mxnet.ndarray.lbsgd_mom_update`.
For details of the update algorithm see :class:`~mxnet.ndarray.sgd_update`
and :class:`~mxnet.ndarray.sgd_mom_update`.
In addition to the SGD updates the LBSGD optimizer uses the LARS, Layer-wise
Adaptive Rate Scaling, algorithm to have a separate learning rate for each
layer of the network, which leads to better stability over large batch sizes.

This optimizer accepts the following parameters in addition to those accepted
by :class:`.Optimizer`.
Expand Down