How is the learning rate decays #304

yapingzhao · 2018-04-13T07:08:15Z

Hi,

optimizer

parser.add_argument("--optimizer", type=str, default="sgd", help="sgd | adam")
parser.add_argument("--learning_rate", type=float, default=1.0,
help="Learning rate. Adam: 0.001 | 0.0001")
parser.add_argument("--warmup_steps", type=int, default=0,
help="How many steps we inverse-decay learning.")
parser.add_argument("--warmup_scheme", type=str, default="t2t", help="""
How to warmup learning rates. Options include:
t2t: Tensor2Tensor's way, start with lr 100 times smaller, then
exponentiate until the specified lr.
""")

Is the learning rate set to 1 when training the model, is it a bit too big? Why is the learning rate output when training the model always 1? I don't understand how this learning rate decays.
Looking forward to your advice or answers.
Best regards,

YangFei1990 · 2019-04-16T21:21:33Z

Have you solved the problem? I met the same issue here

yapingzhao closed this as completed Apr 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How is the learning rate decays #304

How is the learning rate decays #304

yapingzhao commented Apr 13, 2018

YangFei1990 commented Apr 16, 2019

How is the learning rate decays #304

How is the learning rate decays #304

Comments

yapingzhao commented Apr 13, 2018

optimizer

YangFei1990 commented Apr 16, 2019