-
Notifications
You must be signed in to change notification settings - Fork 950
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added SELU nonlinearity #843
Conversation
Welcome, and thank you for the PR!
Indeed! It also needs tests so the coverage stays at 100%, and must be added to the See http:https://lasagne.readthedocs.io/en/latest/user/development.html#how-to-contribute on how to build the documentation locally, and how to run the test suite locally. This allows you to check if everything looks fine and coverage is at 100%. Besides, there are some other issues with your PR:
|
Hi. Thanks for starting this PR I it should also include AlphaDropout, which keeps the mean and variance bounded, in contrast to normal Dropout. See: |
Yes, but I'd treat this as a separate PR, it's okay to add one after the other. @partobs-mdp: Will you still work on this or should somebody else take it? The paper is getting old ;) |
@f0k Working (right now trying to build docs) - just had a hard time doing both university exams and GSoC project 😞 |
No worries! Just wanted to check back. Let me know if you need help. |
Spkeaing about docs, I can't run Running UPDATE Failed to paste complete log (wtf?), now gist conatains the entire error log - including final Python stack trace. |
Aside from that problem and tests, I seem to have implemented everything from @f0k's initial review comment - so now help on that doc issue would be very appreciated ;-) The current code (the one that reproduces the error I described above) is in the last commit. FWIW, I also attach output of |
Sorry, this was fixed in #846, you'll need to rebase: git fetch upstream master
git rebase upstream/master
git push --force Or a shorter version: git pull --rebase upstream master
git push --force Let me know if it works! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's getting there! Some more comments and it's still missing the tests.
@@ -14,6 +14,7 @@ | |||
leaky_rectify | |||
very_leaky_rectify | |||
elu | |||
SELU |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also add selu here
@@ -33,6 +34,8 @@ Detailed description | |||
.. autofunction:: leaky_rectify | |||
.. autofunction:: very_leaky_rectify | |||
.. autofunction:: elu | |||
.. autoclass:: SELU | |||
:members: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and .. autofunction:: selu
lasagne/nonlinearities.py
Outdated
# selu | ||
class SELU(object): | ||
"""Scaled Exponential Linear Unit | ||
:math:`\\varphi(x) = \\lambda (x > 0 ? x : \\alpha(e^x-1)` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're missing a closing bracket in the end.
lasagne/nonlinearities.py
Outdated
self.scale_neg * (theano.tensor.exp(x) - 1)) | ||
|
||
|
||
selu = SELU(scale=1.0507, scale_neg=1.6733) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just because we can, I'd use all digits given by the authors: https://github.com/bioinf-jku/SNNs/blob/49c5298/selu.py#L23-L24.
lasagne/nonlinearities.py
Outdated
selu = SELU(scale=1.0507, scale_neg=1.6733) | ||
selu.__doc__ = """selu(x) | ||
Instance of :class:`SELU` with :math:`\\alpha=1.6733, \\lambda=1.0507` | ||
(the values that were recommended in the original paper or self-normalzing neworks) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be two sentences, because the first sentence is used for the overview table at the beginning of the module documentation. I'd suggest:
Instance of :class:`SELU` with :math:`\\alpha \\approx 1.6733, \\lambda \\approx 1.0507`.
This has a stable and attracting fixed point of :math:`\\mu=0`, :math:`\\sigma=1` under
the assumptions of the original paper on self-normalizing neural networks.
lasagne/nonlinearities.py
Outdated
|
||
See Also | ||
-------- | ||
selu: Instance with :math:`\\alpha=1.6733, \\lambda=1.0507`, as recommended in [1]_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's stay more neutral and say as used in [1]_.
They don't necessarily recommend it, at least not so generally.
lasagne/nonlinearities.py
Outdated
return self.scale * theano.tensor.switch( | ||
x > 0.0, | ||
x, | ||
self.scale_neg * (theano.tensor.exp(x) - 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We generally use 8 spaces for hanging indent, not 4. Also I'd prefer self.scale_neg * theano.tensor.expm1(x)
for the last case.
Looks like I've implemented everything from @f0k's review - now swtiching to writing tests. |
Thanks, there are several PEP8 violations, though. But you'll also see them when you run the tests locally. Just remember to fix them as well. |
Fixed PEP8 issues + added several tests for |
Why review is still red, by the way? I have implemented everything from it (or at least I think so) |
The Travis CI build is over - it crashed on those three classes ( |
You need blank lines after the ">>>" doctest sections, otherwise it thinks
the next line should be the output.
…On Fri, Jun 23, 2017 at 11:07 AM, Konstantin Sidorov < ***@***.***> wrote:
The Travis CI build is over - it crashed on those three classes (
LeakyRectify, ScaledTanh and SELU). What have I messed up?
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#843 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAhPzqR8ICFk-t6uolGO-_T3Ue7v32z2ks5sG9TKgaJpZM4N42U->
.
|
@ebenolson, thanks, looks like now I managed to get it right :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, the tests look good! But you messed up something while fixing pep8 compliance, see my comment.
lasagne/nonlinearities.py
Outdated
@@ -9,12 +9,10 @@ | |||
# sigmoid | |||
def sigmoid(x): | |||
"""Sigmoid activation function :math:`\\varphi(x) = \\frac{1}{1 + e^{-x}}` | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In commit 58904e3, you removed almost all blank lines in lasagne/nonlinearities.py. Please undo!
First of all, sorry for the late reply - I did something dumb while Right now: waiting for Travis CI build to find some test / pep8 crashes to fix. |
Well, one blank line was indeed extra ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Travis is happy, thanks! Some more minor comments, mostly about the rendering of the documentation. Please check these; see http:https://lasagne.readthedocs.io/en/latest/user/development.html#documentation for how.
lasagne/nonlinearities.py
Outdated
@@ -107,7 +107,6 @@ class ScaledTanH(object): | |||
|
|||
By carefully matching :math:`\\alpha` and :math:`\\beta`, the nonlinearity | |||
can also be tuned to preserve the mean and variance of its input: | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You missed this one... it will break the rendering of the documentation. Please always check the complete diff!
lasagne/nonlinearities.py
Outdated
See Also | ||
-------- | ||
selu: Instance with :math:`\\alpha\\approx 1.6733, | ||
\\lambda\\approx 1.0507`, as used in [1]_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you cannot break this into two lines. Please check how the HTML documentation renders. Possibly use =
instead of \\approx
. It's not entirely correct, but close enough.
lasagne/nonlinearities.py
Outdated
.. [1] Günter Klambauer, Thomas Unterthiner, | ||
Andreas Mayr, Sepp Hochreiter (2017): | ||
Self-Normalizing Neural Networks, | ||
https://arxiv.org/abs/1706.02515 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you cannot mix indentations here, but I may be mistaken. Please check how the HTML documentation renders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to resolve the problem, but I simply can't push the line
.. [1] Günter Klambauer, Thomas Unterthiner, Andreas Mayr, Sepp Hochreiter (2017):
shorter than 80 characters, invoking PEP8 check.
Otherwise, I solved all the problems you mentioned. Waiting for your feedback ;)
lasagne/nonlinearities.py
Outdated
Self-Normalizing Neural Networks, | ||
https://arxiv.org/abs/1706.02515 | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There shouldn't be a blank line here, compare to the other classes.
lasagne/tests/test_nonlinearities.py
Outdated
elif nonlinearity.startswith('selu'): | ||
from lasagne.nonlinearities import SELU, selu | ||
if nonlinearity == 'selu': | ||
theano_nonlinearity = SELU(scale=1, scale_neg=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd even remove the scale=1, scale_neg=1
here, so the test also ensures nobody can change the default values without us noticing.
lasagne/nonlinearities.py
Outdated
scale_neg=1.6732632423543772848170429916717) | ||
selu.__doc__ = """selu(x) | ||
Instance of :class:`SELU` with :math:`\\alpha\\approx 1.6733, | ||
\\lambda\\approx 1.0507`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please check whether this ends up in a single line in the overview table of the HTML documentation of the nonlinearities module? I'm not sure if the line break is a problem here. Cheers!
Trying to get Lasagne dev version running on my
|
Hmm, I don't immediately see what could be the reason. It doesn't seem to be the primary error, but you seem to be missing the
Can you try editing theano.gpuarray = Mock()
sys.modules['theano.gpuarray'] = theano.gpuarray
sys.modules['theano.gpuarray.dnn'] = theano.gpuarray.dnn
theano.gpuarray.pygpu_activated = True
theano.gpuarray.dnn.dnn_present = lambda: True I don't know why it works on my machine and fails on yours, though. It must be taking a different code path somewhere. |
No (wtf?),
After editing the file |
What is the status of this PR? I seem to have done everything for the PR, maybe I am missing something? |
To me it's not very pleasant that it has a different API to the other nonlinearities. I think it would be nicer to just have the two parameters as named parameters instead of set in the constructor. By default they could be as specified by the authors, and if people wish to do the equivalent to constructing the object as you do they can use e.g.
edit: Oops just realised this is already done... |
Thank you for the updates and sorry for the delay, I'm finishing up my thesis and cannot check up on Lasagne too frequently. I'll review the code in a minute.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, looks good! Just very minor things, and before merging, I'd like to squash this into two commits. This would go as follows (after fixing the remaining minor things):
git checkout -b backup master # just in case
git checkout master
git fetch upstream master
git reset --soft upstream/master # go to the latest upstream commit, but leave all other changes staged
git reset HEAD docs/conf.py # unstage the sphinx config change
git commit -m 'Added SELU nonlinearity' # commit all other changes
git add docs/conf.py # stage the sphinx config change
git commit -m 'Adapt sphinx configuration for new gpu backend'
git push --force
If you want, I can also take care of these last tweaks, just let me know!
sys.modules['theano.gpuarray.dnn'] = theano.gpuarray.dnn | ||
theano.gpuarray.pygpu_activated = True | ||
theano.gpuarray.dnn.dnn_present = lambda: True | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically, this should be a separate PR, but let's not overdo this, a separate commit will be good enough.
lasagne/nonlinearities.py
Outdated
class SELU(object): | ||
""" | ||
Scaled Exponential Linear Unit | ||
:math:`\\varphi(x)=\\lambda \\left[(x>0) ? x : \\alpha(e^x-1)\\right]`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for consistence with the others, could you please remove the period in the end? (It's not a sentence.)
lasagne/nonlinearities.py
Outdated
|
||
scale_neg : float32 | ||
The scale parameter :math:`\\alpha` | ||
for scaling output for negative argument values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For consistency with the formula and implementation, it should be nonpositive
instead of negative
(nonpositive includes zero, negative doesn't).
lasagne/nonlinearities.py
Outdated
|
||
See Also | ||
-------- | ||
selu: Instance with :math:`\\alpha=1.6733,\\lambda=1.0507` as used in [1]_. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just checked: It's possible to go the correct route with \\approx
if it is indented correctly:
See Also
--------
selu: Instance with :math:`\\alpha\\approx1.6733,\\lambda\\approx1.0507`
as used in [1]_.
Could you do this?
lasagne/nonlinearities.py
Outdated
selu.__doc__ = """selu(x) | ||
|
||
Instance of :class:`SELU` with :math:`\\alpha\\approx 1.6733, | ||
\\lambda\\approx 1.0507`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again please remove the period just for consistency in the overview table in the beginning (sorry for the nitpick).
Squashed everything - I think now I've done everything for this to be merged :) |
Wohoo! Merging, thanks again! |
By the way, next time, I'd recommend to start with |
|
||
|
||
selu = SELU(scale=1.0507009873554804934193349852946, | ||
scale_neg=1.6732632423543772848170429916717) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it can be error prone that the default values for selu()
are different than for SELU()
. Why not make these values from the paper also the default for SELU.__init__()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see your point. But the SELU
class implements the general concept of a scaled ELU, and scale=1.0507009873554804934193349852946
is not exactly the scale that would come to mind first. Also I think it's not easy to confuse
DenseLayer(layer, 123, nonlinearity=lasagne.nonlinearities.selu)
with
DenseLayer(layer, 123, nonlinearity=lasagne.nonlinearities.SELU())
Note that the difference is not only in the capitalization, but also in the brackets (to create an instance of the class). I think this will hardly happen by accident.
I'd be fine with adding a note to the class docstring that it defaults to elu
, and that selu
provides the version from the paper.
Implementing SELU nonlinearity (inspired by this arXiv paper)
This is my first commit to Lasagne, so I don't quite know the rules (yet). I understand, though, that just implementing a nonlinearity is not good enough to make the cut, so I would like to ask what else should I do? (extra examples, docs, ...)
Looking forward to your feedback ;)