Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added SELU nonlinearity #843

Merged
merged 2 commits into from
Jul 12, 2017
Merged

Added SELU nonlinearity #843

merged 2 commits into from
Jul 12, 2017

Conversation

sidorov-ks
Copy link
Contributor

Implementing SELU nonlinearity (inspired by this arXiv paper)

This is my first commit to Lasagne, so I don't quite know the rules (yet). I understand, though, that just implementing a nonlinearity is not good enough to make the cut, so I would like to ask what else should I do? (extra examples, docs, ...)

Looking forward to your feedback ;)

@f0k
Copy link
Member

f0k commented Jun 14, 2017

Welcome, and thank you for the PR!

I understand, though, that just implementing a nonlinearity is not good enough to make the cut

Indeed! It also needs tests so the coverage stays at 100%, and must be added to the nonlinearities.rst files so it shows up in the documentation. See the changeset of the closely related ELU (https://github.com/Lasagne/Lasagne/pull/518/files) and scaled tanh (https://github.com/Lasagne/Lasagne/pull/430/files).

See http:https://lasagne.readthedocs.io/en/latest/user/development.html#how-to-contribute on how to build the documentation locally, and how to run the test suite locally. This allows you to check if everything looks fine and coverage is at 100%.

Besides, there are some other issues with your PR:

@gyglim
Copy link
Contributor

gyglim commented Jun 16, 2017

Hi. Thanks for starting this PR

I it should also include AlphaDropout, which keeps the mean and variance bounded, in contrast to normal Dropout. See:
keras-team/keras#6924 (comment)

@f0k
Copy link
Member

f0k commented Jun 21, 2017

I it should also include AlphaDropout,

Yes, but I'd treat this as a separate PR, it's okay to add one after the other.

@partobs-mdp: Will you still work on this or should somebody else take it? The paper is getting old ;)

@sidorov-ks
Copy link
Contributor Author

@f0k Working (right now trying to build docs) - just had a hard time doing both university exams and GSoC project 😞

@f0k
Copy link
Member

f0k commented Jun 21, 2017

just had a hard time doing both university exams and GSoC project

No worries! Just wanted to check back. Let me know if you need help.

@sidorov-ks
Copy link
Contributor Author

sidorov-ks commented Jun 21, 2017

Spkeaing about docs, I can't run make html, beacuse I run into some weird bug - Theano crashes during building docs. Log: https://gist.github.com/partobs-mdp/04b70d4a80d229c541931e8b22324999

Running make html from my venv (make html SPHINXBUILD='python /usr/bin/sphinx-build' doesn't help either)

UPDATE Failed to paste complete log (wtf?), now gist conatains the entire error log - including final Python stack trace.

@sidorov-ks
Copy link
Contributor Author

Aside from that problem and tests, I seem to have implemented everything from @f0k's initial review comment - so now help on that doc issue would be very appreciated ;-)

The current code (the one that reproduces the error I described above) is in the last commit.

FWIW, I also attach output of make html ran without venv: https://gist.github.com/partobs-mdp/8517691031c93035fee50dd2a101271d

@f0k
Copy link
Member

f0k commented Jun 21, 2017

Spkeaing about docs, I can't run make html

Sorry, this was fixed in #846, you'll need to rebase:

git fetch upstream master
git rebase upstream/master
git push --force

Or a shorter version:

git pull --rebase upstream master
git push --force

Let me know if it works!

Copy link
Member

@f0k f0k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's getting there! Some more comments and it's still missing the tests.

@@ -14,6 +14,7 @@
leaky_rectify
very_leaky_rectify
elu
SELU
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also add selu here

@@ -33,6 +34,8 @@ Detailed description
.. autofunction:: leaky_rectify
.. autofunction:: very_leaky_rectify
.. autofunction:: elu
.. autoclass:: SELU
:members:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and .. autofunction:: selu

# selu
class SELU(object):
"""Scaled Exponential Linear Unit
:math:`\\varphi(x) = \\lambda (x > 0 ? x : \\alpha(e^x-1)`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're missing a closing bracket in the end.

self.scale_neg * (theano.tensor.exp(x) - 1))


selu = SELU(scale=1.0507, scale_neg=1.6733)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just because we can, I'd use all digits given by the authors: https://github.com/bioinf-jku/SNNs/blob/49c5298/selu.py#L23-L24.

selu = SELU(scale=1.0507, scale_neg=1.6733)
selu.__doc__ = """selu(x)
Instance of :class:`SELU` with :math:`\\alpha=1.6733, \\lambda=1.0507`
(the values that were recommended in the original paper or self-normalzing neworks)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be two sentences, because the first sentence is used for the overview table at the beginning of the module documentation. I'd suggest:

Instance of :class:`SELU` with :math:`\\alpha \\approx 1.6733, \\lambda \\approx 1.0507`.
This has a stable and attracting fixed point of :math:`\\mu=0`, :math:`\\sigma=1` under
the assumptions of the original paper on self-normalizing neural networks.


See Also
--------
selu: Instance with :math:`\\alpha=1.6733, \\lambda=1.0507`, as recommended in [1]_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's stay more neutral and say as used in [1]_. They don't necessarily recommend it, at least not so generally.

return self.scale * theano.tensor.switch(
x > 0.0,
x,
self.scale_neg * (theano.tensor.exp(x) - 1))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We generally use 8 spaces for hanging indent, not 4. Also I'd prefer self.scale_neg * theano.tensor.expm1(x) for the last case.

@sidorov-ks
Copy link
Contributor Author

Looks like I've implemented everything from @f0k's review - now swtiching to writing tests.

@f0k
Copy link
Member

f0k commented Jun 23, 2017

Thanks, there are several PEP8 violations, though. But you'll also see them when you run the tests locally. Just remember to fix them as well.

@sidorov-ks
Copy link
Contributor Author

Fixed PEP8 issues + added several tests for SELU. There is one weird issue in doctest (but it also arises in other class-based nonlinearities, e.g. ScaledTanH), don't really know what I have to do with it.

@sidorov-ks
Copy link
Contributor Author

Why review is still red, by the way? I have implemented everything from it (or at least I think so)

@sidorov-ks
Copy link
Contributor Author

The Travis CI build is over - it crashed on those three classes (LeakyRectify, ScaledTanh and SELU). What have I messed up?

@ebenolson
Copy link
Member

ebenolson commented Jun 23, 2017 via email

@sidorov-ks
Copy link
Contributor Author

@ebenolson, thanks, looks like now I managed to get it right :)

Copy link
Member

@f0k f0k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, the tests look good! But you messed up something while fixing pep8 compliance, see my comment.

@@ -9,12 +9,10 @@
# sigmoid
def sigmoid(x):
"""Sigmoid activation function :math:`\\varphi(x) = \\frac{1}{1 + e^{-x}}`

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In commit 58904e3, you removed almost all blank lines in lasagne/nonlinearities.py. Please undo!

@sidorov-ks
Copy link
Contributor Author

sidorov-ks commented Jun 27, 2017

First of all, sorry for the late reply - I did something dumb while pacman -Syua. As a result, right now I'm back to a fresh Ubuntu install without anaconda and other good stuff. However, I do have git and vim now 😃 so I have returned blanklines as you reviewed.

Right now: waiting for Travis CI build to find some test / pep8 crashes to fix.

@sidorov-ks
Copy link
Contributor Author

Well, one blank line was indeed extra (nonlinearities.py:273). Fixed that, and I think that this check will run smoothly.

Copy link
Member

@f0k f0k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Travis is happy, thanks! Some more minor comments, mostly about the rendering of the documentation. Please check these; see http:https://lasagne.readthedocs.io/en/latest/user/development.html#documentation for how.

@@ -107,7 +107,6 @@ class ScaledTanH(object):

By carefully matching :math:`\\alpha` and :math:`\\beta`, the nonlinearity
can also be tuned to preserve the mean and variance of its input:

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You missed this one... it will break the rendering of the documentation. Please always check the complete diff!

See Also
--------
selu: Instance with :math:`\\alpha\\approx 1.6733,
\\lambda\\approx 1.0507`, as used in [1]_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you cannot break this into two lines. Please check how the HTML documentation renders. Possibly use = instead of \\approx. It's not entirely correct, but close enough.

.. [1] Günter Klambauer, Thomas Unterthiner,
Andreas Mayr, Sepp Hochreiter (2017):
Self-Normalizing Neural Networks,
https://arxiv.org/abs/1706.02515
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you cannot mix indentations here, but I may be mistaken. Please check how the HTML documentation renders.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to resolve the problem, but I simply can't push the line
.. [1] Günter Klambauer, Thomas Unterthiner, Andreas Mayr, Sepp Hochreiter (2017):
shorter than 80 characters, invoking PEP8 check.

Otherwise, I solved all the problems you mentioned. Waiting for your feedback ;)

Self-Normalizing Neural Networks,
https://arxiv.org/abs/1706.02515
"""

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be a blank line here, compare to the other classes.

elif nonlinearity.startswith('selu'):
from lasagne.nonlinearities import SELU, selu
if nonlinearity == 'selu':
theano_nonlinearity = SELU(scale=1, scale_neg=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd even remove the scale=1, scale_neg=1 here, so the test also ensures nobody can change the default values without us noticing.

scale_neg=1.6732632423543772848170429916717)
selu.__doc__ = """selu(x)
Instance of :class:`SELU` with :math:`\\alpha\\approx 1.6733,
\\lambda\\approx 1.0507`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please check whether this ends up in a single line in the overview table of the HTML documentation of the nonlinearities module? I'm not sure if the line break is a problem here. Cheers!

@sidorov-ks
Copy link
Contributor Author

Trying to get Lasagne dev version running on my <sigh> new Ubuntu 16.04 LTS, but I ran into two strange bugs:

@f0k
Copy link
Member

f0k commented Jun 30, 2017

Py.test bug - py.test run fails to collect any items.

Hmm, I don't immediately see what could be the reason. It doesn't seem to be the primary error, but you seem to be missing the mock module. Did you install everything from the requirements-dev.txt file?

Sphinx bug

Can you try editing lasagne/docs/conf.py so before sys.modules['pylearn2'] = Mock() you have:

theano.gpuarray = Mock()
sys.modules['theano.gpuarray'] = theano.gpuarray
sys.modules['theano.gpuarray.dnn'] = theano.gpuarray.dnn
theano.gpuarray.pygpu_activated = True
theano.gpuarray.dnn.dnn_present = lambda: True

I don't know why it works on my machine and fails on yours, though. It must be taking a different code path somewhere.

@sidorov-ks
Copy link
Contributor Author

Did you install everything from the requirements-dev.txt file?

No (wtf?), mock module was indeed missing. The test pool have ran smoothly.

Sphinx bug

After editing the file make html works, thanks!

@sidorov-ks
Copy link
Contributor Author

What is the status of this PR? I seem to have done everything for the PR, maybe I am missing something?

@reynoldscem
Copy link

reynoldscem commented Jul 11, 2017

To me it's not very pleasant that it has a different API to the other nonlinearities.

I think it would be nicer to just have the two parameters as named parameters instead of set in the constructor. By default they could be as specified by the authors, and if people wish to do the equivalent to constructing the object as you do they can use functools.partial or similar.

e.g.

from functools import partial
from lasagne.nonlinearities import selu
selu = partial(selu, scale=1., scale_neg=1.)

edit: Oops just realised this is already done...

https://github.com/Lasagne/Lasagne/pull/843/files/1631ec6a0121f80b51b99da41c6a3704f29a827d#diff-e803e2c6fef365217ea4b830787511e0R331

@f0k
Copy link
Member

f0k commented Jul 11, 2017

What is the status of this PR?

Thank you for the updates and sorry for the delay, I'm finishing up my thesis and cannot check up on Lasagne too frequently. I'll review the code in a minute.

I think it would be nicer to just have the two parameters as named parameters instead of set in the constructor.

functools.partial is not commonly known, and my_selu = functools.partial(lasagne.nonlinearities.selu, scale=a, scale_neg=b) is not any easier than my_selu = lasagne.nonlinearities.SELU(scale=a, scale_neg=b). So we chose to use classes to implement nonlinearities that have additional parameters such as leaky relu or scaled tanh. For common use cases, there are predefined instances that can be used directly (as you noticed yourself afterwards).

Copy link
Member

@f0k f0k left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, looks good! Just very minor things, and before merging, I'd like to squash this into two commits. This would go as follows (after fixing the remaining minor things):

git checkout -b backup master  # just in case
git checkout master
git fetch upstream master
git reset --soft upstream/master  # go to the latest upstream commit, but leave all other changes staged
git reset HEAD docs/conf.py  # unstage the sphinx config change
git commit -m 'Added SELU nonlinearity'  # commit all other changes
git add docs/conf.py  # stage the sphinx config change
git commit -m 'Adapt sphinx configuration for new gpu backend'
git push --force

If you want, I can also take care of these last tweaks, just let me know!

sys.modules['theano.gpuarray.dnn'] = theano.gpuarray.dnn
theano.gpuarray.pygpu_activated = True
theano.gpuarray.dnn.dnn_present = lambda: True

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, this should be a separate PR, but let's not overdo this, a separate commit will be good enough.

class SELU(object):
"""
Scaled Exponential Linear Unit
:math:`\\varphi(x)=\\lambda \\left[(x>0) ? x : \\alpha(e^x-1)\\right]`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just for consistence with the others, could you please remove the period in the end? (It's not a sentence.)


scale_neg : float32
The scale parameter :math:`\\alpha`
for scaling output for negative argument values.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with the formula and implementation, it should be nonpositive instead of negative (nonpositive includes zero, negative doesn't).


See Also
--------
selu: Instance with :math:`\\alpha=1.6733,\\lambda=1.0507` as used in [1]_.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just checked: It's possible to go the correct route with \\approx if it is indented correctly:

    See Also
    --------
    selu: Instance with :math:`\\alpha\\approx1.6733,\\lambda\\approx1.0507`
          as used in [1]_.

Could you do this?

selu.__doc__ = """selu(x)

Instance of :class:`SELU` with :math:`\\alpha\\approx 1.6733,
\\lambda\\approx 1.0507`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again please remove the period just for consistency in the overview table in the beginning (sorry for the nitpick).

@sidorov-ks
Copy link
Contributor Author

Squashed everything - I think now I've done everything for this to be merged :)

@f0k
Copy link
Member

f0k commented Jul 12, 2017

Wohoo! Merging, thanks again!

@f0k f0k merged commit 6327a74 into Lasagne:master Jul 12, 2017
@f0k
Copy link
Member

f0k commented Jul 12, 2017

By the way, next time, I'd recommend to start with git checkout -b selu so you have a separate branch for the feature you're working on, instead of implementing it in your local master branch: https://guides.github.com/introduction/flow/. This will make it easier for you and also allow you to work on different ideas in parallel.



selu = SELU(scale=1.0507009873554804934193349852946,
scale_neg=1.6732632423543772848170429916717)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be error prone that the default values for selu() are different than for SELU(). Why not make these values from the paper also the default for SELU.__init__() ?

Copy link
Member

@f0k f0k Jul 28, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point. But the SELU class implements the general concept of a scaled ELU, and scale=1.0507009873554804934193349852946 is not exactly the scale that would come to mind first. Also I think it's not easy to confuse

DenseLayer(layer, 123, nonlinearity=lasagne.nonlinearities.selu)

with

DenseLayer(layer, 123, nonlinearity=lasagne.nonlinearities.SELU())

Note that the difference is not only in the capitalization, but also in the brackets (to create an instance of the class). I think this will hardly happen by accident.

I'd be fine with adding a note to the class docstring that it defaults to elu, and that selu provides the version from the paper.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants