Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented amsgrad updates #887

Merged
merged 1 commit into from
Feb 21, 2018
Merged

Implemented amsgrad updates #887

merged 1 commit into from
Feb 21, 2018

Conversation

fdlm
Copy link
Contributor

@fdlm fdlm commented Dec 22, 2017

This is an implementation of the AMSGrad algorithm as suggested in https://openreview.net/forum?id=ryQu7f-RZ

I don't have any tests for this. It seems that the existing algorithms in Lasagne just check against the Torch implementations. Suggestions?

@f0k
Copy link
Member

f0k commented Dec 22, 2017

Thanks Filip, looks good, but it won't appear in the docs yet. Please amend your commit to add it to the docstring in the beginning, to __all__, and to docs/modules/updates.rst (always directly after adamax).

As the algorithms are so similar, we could make amsgrad a parameter for adam, but I think it'll be easier to find this way.

It seems that the existing algorithms in Lasagne just check against the Torch implementations. Suggestions?

Correct, that's how we set it up, to have some sanity check. @ebenolson ran the Torch script when we needed new values. Unfortunately, torch.optim does not have amsgrad yet. Maybe we should just assume that the code is correct (the change is minimal), run it with Lasagne and use those values. This will not verify whether the initial implementation is correct, but it will ensure it does not break in future. Can you amend your commit accordingly? Let me know if you need help. Cheers!

@fdlm
Copy link
Contributor Author

fdlm commented Dec 23, 2017

As the algorithms are so similar, we could make amsgrad a parameter for adam, but I think it'll be easier to find this way.

Actually, Keras does it with a parameter (keras-team/keras@7b2ca43). I could change that, but keep a amsgrad function that just calls the new adam with the option turned on.

Thanks for the other comments, I will incorporate them after the holidays.

@f0k
Copy link
Member

f0k commented Dec 31, 2017

Actually, Keras does it with a parameter

The same for Pytorch.

I could change that, but keep a amsgrad function that just calls the new adam with the option turned on.

True, but it still sounds weird to have an amsgrad option for adam. As the code is stable (most likely not going to need any updates), I think it's fine to duplicate the function.

@fdlm
Copy link
Contributor Author

fdlm commented Jan 2, 2018

I just added a test case as you described it. I wanted to compare the values to the PyTorch implementation, but in PyTorch, even Adam gives slightly different results.

I agree that an amsgrad function looks better than an amsgrad option, so I left it as is.

@fdlm
Copy link
Contributor Author

fdlm commented Jan 4, 2018

I just stumbled upon this thread: https://www.reddit.com/r/MachineLearning/comments/7nw67c/d_pytorch_are_adam_and_rmsprop_okay/

There might be something wrong with PyTorch's Adam implementation, so maybe it's good that their current Adam gives different results :)

@f0k f0k merged commit 8978b1d into Lasagne:master Feb 21, 2018
@f0k
Copy link
Member

f0k commented Feb 21, 2018

Added two commas that bugged me and merged! Thanks again!

I wanted to compare the values to the PyTorch implementation, but in PyTorch, even Adam gives slightly different results.

Hmm. In the reddit thread you posted they said they "made sure that the rosenbrock convergence tests provide bitwise-exact results as the original Torch implementations". May still be difficult to exactly reproduce Eben's setup in PyTorch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants