Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added tag "bn_stat" to mean and inv_std params #837

Merged
merged 1 commit into from
Feb 21, 2018
Merged

Conversation

astooke
Copy link
Contributor

@astooke astooke commented May 29, 2017

Seems handy.

@f0k
Copy link
Member

f0k commented May 29, 2017

Seems handy.

What's the use case? I'm fine with it if there's a good use. Should cover the cuDNN batch normalization layer as well, though. And I'd prefer batch_norm_stat, for consistency with the keyword arguments taken by get_output_for, and to avoid confusion in future when somebody invents a Boost Nullification layer or something.

@astooke
Copy link
Contributor Author

astooke commented May 29, 2017

I'm using it because I have an actual batch algorithm, where I want the statistics to be only from the current batch, in case I want to use more data than fits on one GPU to make the statistics. Then I manually get and set the stats over multiple calls to a forward pass function over slices of the data....so I need to grab these parameters out from the network.

OK happy to change to batch_norm_stat. Where is the cuDNN layer?

@f0k
Copy link
Member

f0k commented May 30, 2017

Then I manually get and set the stats over multiple calls to a forward pass function over slices of the data....so I need to grab these parameters out from the network.

Note that you can achieve the same by changing the alpha value. Use alpha=1/(1+n) for the nth slice (starting with n=0) to average the statistics over multiple slices. (This uses Welford's algorithm then.)

OK happy to change to batch_norm_stat.

👍

Where is the cuDNN layer?

https://github.com/Lasagne/Lasagne/blob/39bc1b/lasagne/layers/dnn.py#L599

We might also want to add a test ensuring that you get back only the batch normalization statistics when calling get_params on a batch normalization layer with batch_norm_stat=True. Could you do this as well? Cheers!

@astooke
Copy link
Contributor Author

astooke commented May 30, 2017

I looked in the BatchNormDNNLayer class, but there are no calls to add_param. It's defered to the regular BatchNormLayer, so I think that's the only place needing the change.

@astooke
Copy link
Contributor Author

astooke commented May 30, 2017

Note that you can achieve the same by changing the alpha value.

Oh! I was not aware it was possible to set alpha dynamically. The only input I saw for it was in BatchNormLayer.__init__(), and thereafter only self.alpha is used?

@astooke
Copy link
Contributor Author

astooke commented May 30, 2017

I tried writing a short little test method onto the test class for the batch norm layer. But apparently something is wrong. Can you please check it out? I'm not really familiar with the test setup.

@f0k
Copy link
Member

f0k commented Jun 1, 2017

It's defered to the regular BatchNormLayer, so I think that's the only place needing the change.

Ah, correct. How thoughtful.

Oh! I was not aware it was possible to set alpha dynamically.

I don't remember how I did the cuDNN implementation in Theano, but at least for the regular batch normalization layer, alpha can be a symbolic variable. So you can make it a shared variable and change its value when needed (or even include it in an update dictionary to set it to 1 / (1+n)). Similar to how you can make the learning rate dynamic: https://lasagne.readthedocs.io/en/latest/modules/updates.html#examples

But apparently something is wrong. Can you please check it out?

You instantiated a layer as layer = BatchNormLayer(input_shape), but then you called get_params on the class instead of the instance: assert len(BatchNormLayer.get_params()) == 4. You'll need to use layer instead.

@astooke
Copy link
Contributor Author

astooke commented Jun 1, 2017

alpha can be a symbolic variable

Of course! Very nice, thanks I will use this.

@astooke
Copy link
Contributor Author

astooke commented Jun 1, 2017

you called get_params on the class instead of the instance

Oops, not sure why I would have done that, thanks for catching.

Incidentally, I noticed that when you apply batch normalization to a layer using e.g.:

hid = L.DenseLayer(...)
bn = L.batch_norm(hid)

Then calling bn.get_params() yields an empty list, unlike if defining by bn = L.BatchNormLayer(...). I think this is because a nonlinearity layer is returned from L.batch_norm() (whereas DenseLayer has the nonlinearity as an attribute)? Doesn't really bother me now that I know, was just a little surprise.

@f0k
Copy link
Member

f0k commented Jun 2, 2017

I think this is because a nonlinearity layer is returned from L.batch_norm() (whereas DenseLayer has the nonlinearity as an attribute)?

Exactly, it's because the batch_norm() convenience function inserts batch normalization in front of the nonlinearity. Usually you won't call bn.get_params() anyway, but lasagne.layers.get_all_params(bn) to obtain the parameters of all layers, and then the batch normalization parameters are included.

The changes look good now, could you please squash everything into a single commit?

git reset --soft b9e8c6f
git commit --amend -m 'Add "batch_norm_stat" tag to batch normalization mean and inv_std parameters'

@f0k f0k merged commit c712b4a into Lasagne:master Feb 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants