refactor: AdaBound

kozistr · kozistr · Jan 29, 2022 · Jan 29, 2022 · Jan 29, 2022 · Jan 29, 2022
commit 055a5fc88131103fd47bd31c47d19fc003f42108
@@ -37,15 +37,15 @@ def __init__(
  adamd_debias_term: bool = False,
  eps: float = 1e-8,
  ):
- """
+ """AdaBound
  :param params: PARAMETERS. iterable of parameters to optimize or dicts defining parameter groups
  :param lr: float. learning rate
  :param final_lr: float. final learning rate
  :param betas: BETAS. coefficients used for computing running averages of gradient and the squared hessian trace
  :param gamma: float. convergence speed of the bound functions
  :param weight_decay: float. weight decay (L2 penalty)
  :param weight_decouple: bool. the optimizer uses decoupled weight decay as in AdamW
- :param fixed_decay: bool.
+ :param fixed_decay: bool. fix weight decay
  :param amsbound: bool. whether to use the AMSBound variant
  :param adamd_debias_term: bool. Only correct the denominator to avoid inflating step sizes early in training
  :param eps: float. term added to the denominator to improve numerical stability