Added tag "bn_stat" to mean and inv_std params #837

astooke · 2017-05-29T09:35:29Z

Seems handy.

f0k · 2017-05-29T10:55:05Z

Seems handy.

What's the use case? I'm fine with it if there's a good use. Should cover the cuDNN batch normalization layer as well, though. And I'd prefer batch_norm_stat, for consistency with the keyword arguments taken by get_output_for, and to avoid confusion in future when somebody invents a Boost Nullification layer or something.

astooke · 2017-05-29T17:43:22Z

I'm using it because I have an actual batch algorithm, where I want the statistics to be only from the current batch, in case I want to use more data than fits on one GPU to make the statistics. Then I manually get and set the stats over multiple calls to a forward pass function over slices of the data....so I need to grab these parameters out from the network.

OK happy to change to batch_norm_stat. Where is the cuDNN layer?

f0k · 2017-05-30T11:35:18Z

Then I manually get and set the stats over multiple calls to a forward pass function over slices of the data....so I need to grab these parameters out from the network.

Note that you can achieve the same by changing the alpha value. Use alpha=1/(1+n) for the nth slice (starting with n=0) to average the statistics over multiple slices. (This uses Welford's algorithm then.)

OK happy to change to batch_norm_stat.

👍

Where is the cuDNN layer?

https://github.com/Lasagne/Lasagne/blob/39bc1b/lasagne/layers/dnn.py#L599

We might also want to add a test ensuring that you get back only the batch normalization statistics when calling get_params on a batch normalization layer with batch_norm_stat=True. Could you do this as well? Cheers!

astooke · 2017-05-30T18:06:43Z

I looked in the BatchNormDNNLayer class, but there are no calls to add_param. It's defered to the regular BatchNormLayer, so I think that's the only place needing the change.

astooke · 2017-05-30T18:09:35Z

Note that you can achieve the same by changing the alpha value.

Oh! I was not aware it was possible to set alpha dynamically. The only input I saw for it was in BatchNormLayer.__init__(), and thereafter only self.alpha is used?

astooke · 2017-05-30T18:54:15Z

I tried writing a short little test method onto the test class for the batch norm layer. But apparently something is wrong. Can you please check it out? I'm not really familiar with the test setup.

f0k · 2017-06-01T09:47:54Z

It's defered to the regular BatchNormLayer, so I think that's the only place needing the change.

Ah, correct. How thoughtful.

Oh! I was not aware it was possible to set alpha dynamically.

I don't remember how I did the cuDNN implementation in Theano, but at least for the regular batch normalization layer, alpha can be a symbolic variable. So you can make it a shared variable and change its value when needed (or even include it in an update dictionary to set it to 1 / (1+n)). Similar to how you can make the learning rate dynamic: https://lasagne.readthedocs.io/en/latest/modules/updates.html#examples

But apparently something is wrong. Can you please check it out?

You instantiated a layer as layer = BatchNormLayer(input_shape), but then you called get_params on the class instead of the instance: assert len(BatchNormLayer.get_params()) == 4. You'll need to use layer instead.

astooke · 2017-06-01T19:52:43Z

alpha can be a symbolic variable

Of course! Very nice, thanks I will use this.

astooke · 2017-06-01T20:15:15Z

you called get_params on the class instead of the instance

Oops, not sure why I would have done that, thanks for catching.

Incidentally, I noticed that when you apply batch normalization to a layer using e.g.:

hid = L.DenseLayer(...)
bn = L.batch_norm(hid)

Then calling bn.get_params() yields an empty list, unlike if defining by bn = L.BatchNormLayer(...). I think this is because a nonlinearity layer is returned from L.batch_norm() (whereas DenseLayer has the nonlinearity as an attribute)? Doesn't really bother me now that I know, was just a little surprise.

f0k · 2017-06-02T08:32:01Z

I think this is because a nonlinearity layer is returned from L.batch_norm() (whereas DenseLayer has the nonlinearity as an attribute)?

Exactly, it's because the batch_norm() convenience function inserts batch normalization in front of the nonlinearity. Usually you won't call bn.get_params() anyway, but lasagne.layers.get_all_params(bn) to obtain the parameters of all layers, and then the batch normalization parameters are included.

The changes look good now, could you please squash everything into a single commit?

git reset --soft b9e8c6f
git commit --amend -m 'Add "batch_norm_stat" tag to batch normalization mean and inv_std parameters'

…ameters

Add "batch_norm_stat" tag to batch normalization mean and inv_std par…

1969e16

…ameters

astooke force-pushed the master branch from 912cb8f to 1969e16 Compare June 2, 2017 17:08

f0k merged commit c712b4a into Lasagne:master Feb 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added tag "bn_stat" to mean and inv_std params #837

Added tag "bn_stat" to mean and inv_std params #837

astooke commented May 29, 2017

f0k commented May 29, 2017

astooke commented May 29, 2017

f0k commented May 30, 2017

astooke commented May 30, 2017

astooke commented May 30, 2017

astooke commented May 30, 2017

f0k commented Jun 1, 2017

astooke commented Jun 1, 2017

astooke commented Jun 1, 2017

f0k commented Jun 2, 2017 •

edited

Loading

Added tag "bn_stat" to mean and inv_std params #837

Added tag "bn_stat" to mean and inv_std params #837

Conversation

astooke commented May 29, 2017

f0k commented May 29, 2017

astooke commented May 29, 2017

f0k commented May 30, 2017

astooke commented May 30, 2017

astooke commented May 30, 2017

astooke commented May 30, 2017

f0k commented Jun 1, 2017

astooke commented Jun 1, 2017

astooke commented Jun 1, 2017

f0k commented Jun 2, 2017 • edited Loading

f0k commented Jun 2, 2017 •

edited

Loading