Added SELU nonlinearity #843

sidorov-ks · 2017-06-13T17:58:12Z

Implementing SELU nonlinearity (inspired by this arXiv paper)

This is my first commit to Lasagne, so I don't quite know the rules (yet). I understand, though, that just implementing a nonlinearity is not good enough to make the cut, so I would like to ask what else should I do? (extra examples, docs, ...)

Looking forward to your feedback ;)

f0k · 2017-06-14T13:25:12Z

Welcome, and thank you for the PR!

I understand, though, that just implementing a nonlinearity is not good enough to make the cut

Indeed! It also needs tests so the coverage stays at 100%, and must be added to the nonlinearities.rst files so it shows up in the documentation. See the changeset of the closely related ELU (https://github.com/Lasagne/Lasagne/pull/518/files) and scaled tanh (https://github.com/Lasagne/Lasagne/pull/430/files).

See http:https://lasagne.readthedocs.io/en/latest/user/development.html#how-to-contribute on how to build the documentation locally, and how to run the test suite locally. This allows you to check if everything looks fine and coverage is at 100%.

Besides, there are some other issues with your PR:

The reference must be referenced with [1]_, not [1]
The reference should include all authors
elu is a special case of SELU now -- we could implement it as such, but actually it may not be worth the effort since elu is just a single line of code. We did the same when introducing scaled tanh; there was no point in making tanh a ScaledTanh instance.
it's not clear what values to use for the two parameters, I'd suggest to add a selu instance with the parameters suggested by the authors (https://github.com/bioinf-jku/SNNs/blob/49c5298/selu.py#L23-L24). I.e., after defining the class, add selu = SELU(..., ...), give it a docstring as we did here: https://github.com/Lasagne/Lasagne/blob/6eb2f86/lasagne/nonlinearities.py#L223-L227, and reference it from the SELU docstring as we did here: https://github.com/Lasagne/Lasagne/blob/6eb2f86/lasagne/nonlinearities.py#L204
The implementation is not in line with the docstring and paper (>= vs >). Take the elu implementation as a starting point.
SELU should come directly after elu in both the code and documentation since they're related.

gyglim · 2017-06-16T08:30:28Z

Hi. Thanks for starting this PR

I it should also include AlphaDropout, which keeps the mean and variance bounded, in contrast to normal Dropout. See:
keras-team/keras#6924 (comment)

f0k · 2017-06-21T16:05:37Z

I it should also include AlphaDropout,

Yes, but I'd treat this as a separate PR, it's okay to add one after the other.

@partobs-mdp: Will you still work on this or should somebody else take it? The paper is getting old ;)

sidorov-ks · 2017-06-21T16:20:37Z

@f0k Working (right now trying to build docs) - just had a hard time doing both university exams and GSoC project 😞

f0k · 2017-06-21T16:27:22Z

just had a hard time doing both university exams and GSoC project

No worries! Just wanted to check back. Let me know if you need help.

sidorov-ks · 2017-06-21T16:28:08Z

Spkeaing about docs, I can't run make html, beacuse I run into some weird bug - Theano crashes during building docs. Log: https://gist.github.com/partobs-mdp/04b70d4a80d229c541931e8b22324999

Running make html from my venv (make html SPHINXBUILD='python /usr/bin/sphinx-build' doesn't help either)

UPDATE Failed to paste complete log (wtf?), now gist conatains the entire error log - including final Python stack trace.

sidorov-ks · 2017-06-21T16:51:28Z

Aside from that problem and tests, I seem to have implemented everything from @f0k's initial review comment - so now help on that doc issue would be very appreciated ;-)

The current code (the one that reproduces the error I described above) is in the last commit.

FWIW, I also attach output of make html ran without venv: https://gist.github.com/partobs-mdp/8517691031c93035fee50dd2a101271d

f0k · 2017-06-21T17:32:42Z

Spkeaing about docs, I can't run make html

Sorry, this was fixed in #846, you'll need to rebase:

git fetch upstream master
git rebase upstream/master
git push --force

Or a shorter version:

git pull --rebase upstream master
git push --force

Let me know if it works!

f0k

It's getting there! Some more comments and it's still missing the tests.

f0k · 2017-06-21T17:34:19Z

docs/modules/nonlinearities.rst

@@ -14,6 +14,7 @@
 leaky_rectify
 very_leaky_rectify
 elu
+ SELU


also add selu here

f0k · 2017-06-21T17:34:38Z

docs/modules/nonlinearities.rst

@@ -33,6 +34,8 @@ Detailed description
 .. autofunction:: leaky_rectify
 .. autofunction:: very_leaky_rectify
 .. autofunction:: elu
+.. autoclass:: SELU
+ :members:


and .. autofunction:: selu

f0k · 2017-06-21T17:36:06Z

lasagne/nonlinearities.py

+# selu
+class SELU(object):
+ """Scaled Exponential Linear Unit
+ :math:`\\varphi(x) = \\lambda (x > 0 ? x : \\alpha(e^x-1)`


You're missing a closing bracket in the end.

f0k · 2017-06-21T17:37:25Z

lasagne/nonlinearities.py

+ self.scale_neg * (theano.tensor.exp(x) - 1))
+
+
+selu = SELU(scale=1.0507, scale_neg=1.6733)


Just because we can, I'd use all digits given by the authors: https://github.com/bioinf-jku/SNNs/blob/49c5298/selu.py#L23-L24.

f0k · 2017-06-21T17:46:44Z

lasagne/nonlinearities.py

+selu = SELU(scale=1.0507, scale_neg=1.6733)
+selu.__doc__ = """selu(x)
+ Instance of :class:`SELU` with :math:`\\alpha=1.6733, \\lambda=1.0507`
+ (the values that were recommended in the original paper or self-normalzing neworks)


Should be two sentences, because the first sentence is used for the overview table at the beginning of the module documentation. I'd suggest:

Instance of :class:`SELU` with :math:`\\alpha \\approx 1.6733, \\lambda \\approx 1.0507`. This has a stable and attracting fixed point of :math:`\\mu=0`, :math:`\\sigma=1` under the assumptions of the original paper on self-normalizing neural networks.

f0k · 2017-06-21T17:48:27Z

lasagne/nonlinearities.py

+
+ See Also
+ --------
+ selu: Instance with :math:`\\alpha=1.6733, \\lambda=1.0507`, as recommended in [1]_.


Let's stay more neutral and say as used in [1]_. They don't necessarily recommend it, at least not so generally.

f0k · 2017-06-21T17:51:20Z

lasagne/nonlinearities.py

+ return self.scale * theano.tensor.switch(
+ x > 0.0,
+ x,
+ self.scale_neg * (theano.tensor.exp(x) - 1))


We generally use 8 spaces for hanging indent, not 4. Also I'd prefer self.scale_neg * theano.tensor.expm1(x) for the last case.

sidorov-ks · 2017-06-23T08:56:12Z

Looks like I've implemented everything from @f0k's review - now swtiching to writing tests.

f0k · 2017-06-23T09:47:45Z

Thanks, there are several PEP8 violations, though. But you'll also see them when you run the tests locally. Just remember to fix them as well.

sidorov-ks · 2017-06-23T14:46:34Z

Fixed PEP8 issues + added several tests for SELU. There is one weird issue in doctest (but it also arises in other class-based nonlinearities, e.g. ScaledTanH), don't really know what I have to do with it.

sidorov-ks · 2017-06-23T14:50:24Z

Why review is still red, by the way? I have implemented everything from it (or at least I think so)

sidorov-ks · 2017-06-23T15:07:52Z

The Travis CI build is over - it crashed on those three classes (LeakyRectify, ScaledTanh and SELU). What have I messed up?

ebenolson · 2017-06-23T15:26:10Z

You need blank lines after the ">>>" doctest sections, otherwise it thinks the next line should be the output.

…

On Fri, Jun 23, 2017 at 11:07 AM, Konstantin Sidorov < ***@***.***> wrote: The Travis CI build is over - it crashed on those three classes ( LeakyRectify, ScaledTanh and SELU). What have I messed up? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#843 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAhPzqR8ICFk-t6uolGO-_T3Ue7v32z2ks5sG9TKgaJpZM4N42U-> .

sidorov-ks · 2017-06-23T15:44:44Z

@ebenolson, thanks, looks like now I managed to get it right :)

f0k

Thank you, the tests look good! But you messed up something while fixing pep8 compliance, see my comment.

f0k · 2017-06-26T09:31:01Z

lasagne/nonlinearities.py

@@ -9,12 +9,10 @@
 # sigmoid
 def sigmoid(x):
 """Sigmoid activation function :math:`\\varphi(x) = \\frac{1}{1 + e^{-x}}`
-


In commit 58904e3, you removed almost all blank lines in lasagne/nonlinearities.py. Please undo!

sidorov-ks · 2017-06-27T15:47:30Z

First of all, sorry for the late reply - I did something dumb while pacman -Syua. As a result, right now I'm back to a fresh Ubuntu install without anaconda and other good stuff. However, I do have git and vim now 😃 so I have returned blanklines as you reviewed.

Right now: waiting for Travis CI build to find some test / pep8 crashes to fix.

sidorov-ks · 2017-06-27T16:00:15Z

Well, one blank line was indeed extra (nonlinearities.py:273). Fixed that, and I think that this check will run smoothly.

f0k

Travis is happy, thanks! Some more minor comments, mostly about the rendering of the documentation. Please check these; see http:https://lasagne.readthedocs.io/en/latest/user/development.html#documentation for how.

f0k · 2017-06-28T09:44:40Z

lasagne/nonlinearities.py

@@ -107,7 +107,6 @@ class ScaledTanH(object):

 By carefully matching :math:`\\alpha` and :math:`\\beta`, the nonlinearity
 can also be tuned to preserve the mean and variance of its input:
-


You missed this one... it will break the rendering of the documentation. Please always check the complete diff!

f0k · 2017-06-28T09:45:59Z

lasagne/nonlinearities.py

+ See Also
+ --------
+ selu: Instance with :math:`\\alpha\\approx 1.6733,
+ \\lambda\\approx 1.0507`, as used in [1]_.


I think you cannot break this into two lines. Please check how the HTML documentation renders. Possibly use = instead of \\approx. It's not entirely correct, but close enough.

f0k · 2017-06-28T09:46:44Z

lasagne/nonlinearities.py

+ .. [1] Günter Klambauer, Thomas Unterthiner,
+ Andreas Mayr, Sepp Hochreiter (2017):
+ Self-Normalizing Neural Networks,
+ https://arxiv.org/abs/1706.02515


I think you cannot mix indentations here, but I may be mistaken. Please check how the HTML documentation renders.

I tried to resolve the problem, but I simply can't push the line
.. [1] Günter Klambauer, Thomas Unterthiner, Andreas Mayr, Sepp Hochreiter (2017):
shorter than 80 characters, invoking PEP8 check.

Otherwise, I solved all the problems you mentioned. Waiting for your feedback ;)

f0k · 2017-06-28T09:47:55Z

lasagne/nonlinearities.py

+ Self-Normalizing Neural Networks,
+ https://arxiv.org/abs/1706.02515
+ """
+


There shouldn't be a blank line here, compare to the other classes.

f0k · 2017-06-28T09:56:55Z

lasagne/tests/test_nonlinearities.py

+ elif nonlinearity.startswith('selu'):
+ from lasagne.nonlinearities import SELU, selu
+ if nonlinearity == 'selu':
+ theano_nonlinearity = SELU(scale=1, scale_neg=1)


I'd even remove the scale=1, scale_neg=1 here, so the test also ensures nobody can change the default values without us noticing.

f0k · 2017-06-28T09:58:18Z

lasagne/nonlinearities.py

+ scale_neg=1.6732632423543772848170429916717)
+selu.__doc__ = """selu(x)
+ Instance of :class:`SELU` with :math:`\\alpha\\approx 1.6733,
+ \\lambda\\approx 1.0507`.


Can you please check whether this ends up in a single line in the overview table of the HTML documentation of the nonlinearities module? I'm not sure if the line break is a problem here. Cheers!

sidorov-ks · 2017-06-29T07:27:57Z

Trying to get Lasagne dev version running on my <sigh> new Ubuntu 16.04 LTS, but I ran into two strange bugs:

Py.test bug - py.test run fails to collect any items. Error log: https://gist.github.com/partobs-mdp/e488fb531726055625b2a67e2c67f266.
Sphinx bug - about the same crash as Added SELU nonlinearity #843 (comment), but with other theano parts failing. Error log: https://gist.github.com/partobs-mdp/04b70d4a80d229c541931e8b22324999 (Yes, I tried fetching the upstream version)

f0k · 2017-06-30T12:24:46Z

Py.test bug - py.test run fails to collect any items.

Hmm, I don't immediately see what could be the reason. It doesn't seem to be the primary error, but you seem to be missing the mock module. Did you install everything from the requirements-dev.txt file?

Sphinx bug

Can you try editing lasagne/docs/conf.py so before sys.modules['pylearn2'] = Mock() you have:

theano.gpuarray = Mock()
sys.modules['theano.gpuarray'] = theano.gpuarray
sys.modules['theano.gpuarray.dnn'] = theano.gpuarray.dnn
theano.gpuarray.pygpu_activated = True
theano.gpuarray.dnn.dnn_present = lambda: True

I don't know why it works on my machine and fails on yours, though. It must be taking a different code path somewhere.

sidorov-ks · 2017-07-01T05:40:18Z

Did you install everything from the requirements-dev.txt file?

No (wtf?), mock module was indeed missing. The test pool have ran smoothly.

Sphinx bug

After editing the file make html works, thanks!

sidorov-ks · 2017-07-08T15:04:36Z

What is the status of this PR? I seem to have done everything for the PR, maybe I am missing something?

reynoldscem · 2017-07-11T09:28:14Z

To me it's not very pleasant that it has a different API to the other nonlinearities.

I think it would be nicer to just have the two parameters as named parameters instead of set in the constructor. By default they could be as specified by the authors, and if people wish to do the equivalent to constructing the object as you do they can use functools.partial or similar.

e.g.

from functools import partial
from lasagne.nonlinearities import selu
selu = partial(selu, scale=1., scale_neg=1.)

edit: Oops just realised this is already done...

https://github.com/Lasagne/Lasagne/pull/843/files/1631ec6a0121f80b51b99da41c6a3704f29a827d#diff-e803e2c6fef365217ea4b830787511e0R331

f0k · 2017-07-11T10:59:33Z

What is the status of this PR?

Thank you for the updates and sorry for the delay, I'm finishing up my thesis and cannot check up on Lasagne too frequently. I'll review the code in a minute.

I think it would be nicer to just have the two parameters as named parameters instead of set in the constructor.

functools.partial is not commonly known, and my_selu = functools.partial(lasagne.nonlinearities.selu, scale=a, scale_neg=b) is not any easier than my_selu = lasagne.nonlinearities.SELU(scale=a, scale_neg=b). So we chose to use classes to implement nonlinearities that have additional parameters such as leaky relu or scaled tanh. For common use cases, there are predefined instances that can be used directly (as you noticed yourself afterwards).

f0k

Thank you, looks good! Just very minor things, and before merging, I'd like to squash this into two commits. This would go as follows (after fixing the remaining minor things):

git checkout -b backup master  # just in case
git checkout master
git fetch upstream master
git reset --soft upstream/master  # go to the latest upstream commit, but leave all other changes staged
git reset HEAD docs/conf.py  # unstage the sphinx config change
git commit -m 'Added SELU nonlinearity'  # commit all other changes
git add docs/conf.py  # stage the sphinx config change
git commit -m 'Adapt sphinx configuration for new gpu backend'
git push --force

If you want, I can also take care of these last tweaks, just let me know!

f0k · 2017-07-11T11:09:17Z

docs/conf.py

+sys.modules['theano.gpuarray.dnn'] = theano.gpuarray.dnn
+theano.gpuarray.pygpu_activated = True
+theano.gpuarray.dnn.dnn_present = lambda: True
+


Technically, this should be a separate PR, but let's not overdo this, a separate commit will be good enough.

f0k · 2017-07-11T11:13:31Z

lasagne/nonlinearities.py

+class SELU(object):
+ """
+ Scaled Exponential Linear Unit
+ :math:`\\varphi(x)=\\lambda \\left[(x>0) ? x : \\alpha(e^x-1)\\right]`.


Just for consistence with the others, could you please remove the period in the end? (It's not a sentence.)

f0k · 2017-07-11T11:14:30Z

lasagne/nonlinearities.py

+
+ scale_neg : float32
+ The scale parameter :math:`\\alpha`
+ for scaling output for negative argument values.


For consistency with the formula and implementation, it should be nonpositive instead of negative (nonpositive includes zero, negative doesn't).

f0k · 2017-07-11T11:18:00Z

lasagne/nonlinearities.py

+
+ See Also
+ --------
+ selu: Instance with :math:`\\alpha=1.6733,\\lambda=1.0507` as used in [1]_.


I just checked: It's possible to go the correct route with \\approx if it is indented correctly:

See Also -------- selu: Instance with :math:`\\alpha\\approx1.6733,\\lambda\\approx1.0507` as used in [1]_.

Could you do this?

f0k · 2017-07-11T11:20:45Z

lasagne/nonlinearities.py

+selu.__doc__ = """selu(x)
+
+ Instance of :class:`SELU` with :math:`\\alpha\\approx 1.6733,
+ \\lambda\\approx 1.0507`.


Again please remove the period just for consistency in the overview table in the beginning (sorry for the nitpick).

sidorov-ks · 2017-07-12T09:33:48Z

Squashed everything - I think now I've done everything for this to be merged :)

f0k · 2017-07-12T09:40:22Z

Wohoo! Merging, thanks again!

f0k · 2017-07-12T09:44:32Z

By the way, next time, I'd recommend to start with git checkout -b selu so you have a separate branch for the feature you're working on, instead of implementing it in your local master branch: https://guides.github.com/introduction/flow/. This will make it easier for you and also allow you to work on different ideas in parallel.

andreh7 · 2017-07-26T08:05:05Z

lasagne/nonlinearities.py

+
+
+selu = SELU(scale=1.0507009873554804934193349852946,
+ scale_neg=1.6732632423543772848170429916717)


I think it can be error prone that the default values for selu() are different than for SELU(). Why not make these values from the paper also the default for SELU.__init__() ?

I see your point. But the SELU class implements the general concept of a scaled ELU, and scale=1.0507009873554804934193349852946 is not exactly the scale that would come to mind first. Also I think it's not easy to confuse

DenseLayer(layer, 123, nonlinearity=lasagne.nonlinearities.selu)

with

DenseLayer(layer, 123, nonlinearity=lasagne.nonlinearities.SELU())

Note that the difference is not only in the capitalization, but also in the brackets (to create an instance of the class). I think this will hardly happen by accident.

I'd be fine with adding a note to the class docstring that it defaults to elu, and that selu provides the version from the paper.

f0k requested changes Jun 21, 2017

View reviewed changes

sidorov-ks force-pushed the master branch from 664713d to f9c34cb Compare June 23, 2017 08:54

f0k requested changes Jun 26, 2017

View reviewed changes

f0k requested changes Jun 28, 2017

View reviewed changes

sidorov-ks mentioned this pull request Jul 1, 2017

Adding YellowFin to the lasagne/optimizers #856

Open

f0k requested changes Jul 11, 2017

View reviewed changes

Added SELU nonlinearity

164c9d3

Adapt sphinx configuration for new gpu backend

1aa7756

sidorov-ks force-pushed the master branch from a15172d to 1aa7756 Compare July 12, 2017 09:01

f0k merged commit 6327a74 into Lasagne:master Jul 12, 2017

andreh7 reviewed Jul 26, 2017

View reviewed changes

		self.scale_neg * (theano.tensor.exp(x) - 1))


		selu = SELU(scale=1.0507, scale_neg=1.6733)

		@@ -107,7 +107,6 @@ class ScaledTanH(object):

		By carefully matching :math:`\\alpha` and :math:`\\beta`, the nonlinearity
		can also be tuned to preserve the mean and variance of its input:



		selu = SELU(scale=1.0507009873554804934193349852946,
		scale_neg=1.6732632423543772848170429916717)

Added SELU nonlinearity #843

Added SELU nonlinearity #843

Conversation

sidorov-ks commented Jun 13, 2017

f0k commented Jun 14, 2017

gyglim commented Jun 16, 2017

f0k commented Jun 21, 2017

sidorov-ks commented Jun 21, 2017

f0k commented Jun 21, 2017

sidorov-ks commented Jun 21, 2017 • edited Loading

sidorov-ks commented Jun 21, 2017

f0k commented Jun 21, 2017

f0k left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sidorov-ks commented Jun 23, 2017

f0k commented Jun 23, 2017

sidorov-ks commented Jun 23, 2017

sidorov-ks commented Jun 23, 2017

sidorov-ks commented Jun 23, 2017

ebenolson commented Jun 23, 2017 via email

sidorov-ks commented Jun 23, 2017

f0k left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sidorov-ks commented Jun 27, 2017 • edited Loading

sidorov-ks commented Jun 27, 2017

f0k left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sidorov-ks commented Jun 29, 2017

f0k commented Jun 30, 2017 • edited Loading

sidorov-ks commented Jul 1, 2017

sidorov-ks commented Jul 8, 2017

reynoldscem commented Jul 11, 2017 • edited Loading

f0k commented Jul 11, 2017 • edited Loading

f0k left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sidorov-ks commented Jul 12, 2017

f0k commented Jul 12, 2017

f0k commented Jul 12, 2017

Choose a reason for hiding this comment

f0k Jul 28, 2017 • edited Loading

Choose a reason for hiding this comment

sidorov-ks commented Jun 21, 2017 •

edited

Loading

f0k left a comment •

edited

Loading

sidorov-ks commented Jun 27, 2017 •

edited

Loading

f0k commented Jun 30, 2017 •

edited

Loading

reynoldscem commented Jul 11, 2017 •

edited

Loading

f0k commented Jul 11, 2017 •

edited

Loading

f0k Jul 28, 2017 •

edited

Loading