update: RAdam

kozistr · kozistr · Jan 29, 2022 · Jan 29, 2022 · Jan 29, 2022 · Jan 29, 2022
commit 6fb3a4f25ec62a1c50aa61808b95006a8372bf95
@@ -35,13 +35,13 @@ def __init__(
  adamd_debias_term: bool = False,
  eps: float = 1e-8,
  ):
- """
+ """RAdam
  :param params: PARAMETERS. iterable of parameters to optimize or dicts defining parameter groups
- :param lr: float. learning rate.
+ :param lr: float. learning rate
  :param betas: BETAS. coefficients used for computing running averages of gradient and the squared hessian trace
  :param weight_decay: float. weight decay (L2 penalty)
  :param n_sma_threshold: int. (recommended is 5)
- :param degenerated_to_sgd: float.
+ :param degenerated_to_sgd: float. degenerated to SGD
  :param adamd_debias_term: bool. Only correct the denominator to avoid inflating step sizes early in training
  :param eps: float. term added to the denominator to improve numerical stability
  """