Implemented amsgrad updates #887

fdlm · 2017-12-22T14:32:50Z

This is an implementation of the AMSGrad algorithm as suggested in https://openreview.net/forum?id=ryQu7f-RZ

I don't have any tests for this. It seems that the existing algorithms in Lasagne just check against the Torch implementations. Suggestions?

f0k · 2017-12-22T15:37:34Z

Thanks Filip, looks good, but it won't appear in the docs yet. Please amend your commit to add it to the docstring in the beginning, to __all__, and to docs/modules/updates.rst (always directly after adamax).

As the algorithms are so similar, we could make amsgrad a parameter for adam, but I think it'll be easier to find this way.

It seems that the existing algorithms in Lasagne just check against the Torch implementations. Suggestions?

Correct, that's how we set it up, to have some sanity check. @ebenolson ran the Torch script when we needed new values. Unfortunately, torch.optim does not have amsgrad yet. Maybe we should just assume that the code is correct (the change is minimal), run it with Lasagne and use those values. This will not verify whether the initial implementation is correct, but it will ensure it does not break in future. Can you amend your commit accordingly? Let me know if you need help. Cheers!

fdlm · 2017-12-23T09:24:30Z

As the algorithms are so similar, we could make amsgrad a parameter for adam, but I think it'll be easier to find this way.

Actually, Keras does it with a parameter (keras-team/keras@7b2ca43). I could change that, but keep a amsgrad function that just calls the new adam with the option turned on.

Thanks for the other comments, I will incorporate them after the holidays.

f0k · 2017-12-31T11:45:32Z

Actually, Keras does it with a parameter

The same for Pytorch.

I could change that, but keep a amsgrad function that just calls the new adam with the option turned on.

True, but it still sounds weird to have an amsgrad option for adam. As the code is stable (most likely not going to need any updates), I think it's fine to duplicate the function.

fdlm · 2018-01-02T10:10:08Z

I just added a test case as you described it. I wanted to compare the values to the PyTorch implementation, but in PyTorch, even Adam gives slightly different results.

I agree that an amsgrad function looks better than an amsgrad option, so I left it as is.

fdlm · 2018-01-04T07:27:19Z

I just stumbled upon this thread: https://www.reddit.com/r/MachineLearning/comments/7nw67c/d_pytorch_are_adam_and_rmsprop_okay/

There might be something wrong with PyTorch's Adam implementation, so maybe it's good that their current Adam gives different results :)

f0k · 2018-02-21T11:02:29Z

Added two commas that bugged me and merged! Thanks again!

I wanted to compare the values to the PyTorch implementation, but in PyTorch, even Adam gives slightly different results.

Hmm. In the reddit thread you posted they said they "made sure that the rosenbrock convergence tests provide bitwise-exact results as the original Torch implementations". May still be difficult to exactly reproduce Eben's setup in PyTorch.

fdlm force-pushed the master branch from 1986357 to 6e684d1 Compare January 2, 2018 10:05

implemented amsgrad updates

5d968ba

f0k force-pushed the master branch from 6e684d1 to 5d968ba Compare February 21, 2018 10:40

f0k merged commit 8978b1d into Lasagne:master Feb 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implemented amsgrad updates #887

Implemented amsgrad updates #887

fdlm commented Dec 22, 2017

f0k commented Dec 22, 2017

fdlm commented Dec 23, 2017

f0k commented Dec 31, 2017

fdlm commented Jan 2, 2018

fdlm commented Jan 4, 2018

f0k commented Feb 21, 2018

Implemented amsgrad updates #887

Implemented amsgrad updates #887

Conversation

fdlm commented Dec 22, 2017

f0k commented Dec 22, 2017

fdlm commented Dec 23, 2017

f0k commented Dec 31, 2017

fdlm commented Jan 2, 2018

fdlm commented Jan 4, 2018

f0k commented Feb 21, 2018