Skip to content
This repository has been archived by the owner on Dec 11, 2023. It is now read-only.

How is the learning rate decays #304

Closed
yapingzhao opened this issue Apr 13, 2018 · 1 comment
Closed

How is the learning rate decays #304

yapingzhao opened this issue Apr 13, 2018 · 1 comment

Comments

@yapingzhao
Copy link

Hi,

optimizer

parser.add_argument("--optimizer", type=str, default="sgd", help="sgd | adam")
parser.add_argument("--learning_rate", type=float, default=1.0,
help="Learning rate. Adam: 0.001 | 0.0001")
parser.add_argument("--warmup_steps", type=int, default=0,
help="How many steps we inverse-decay learning.")
parser.add_argument("--warmup_scheme", type=str, default="t2t", help="""
How to warmup learning rates. Options include:
t2t: Tensor2Tensor's way, start with lr 100 times smaller, then
exponentiate until the specified lr.
""")

Is the learning rate set to 1 when training the model, is it a bit too big? Why is the learning rate output when training the model always 1? I don't understand how this learning rate decays.
Looking forward to your advice or answers.
Best regards,

@YangFei1990
Copy link

Have you solved the problem? I met the same issue here

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants