Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dense keyword handling, and docstring #1440

Merged
merged 9 commits into from
Mar 6, 2021
Merged

Dense keyword handling, and docstring #1440

merged 9 commits into from
Mar 6, 2021

Conversation

mcabbott
Copy link
Member

@mcabbott mcabbott commented Dec 26, 2020

Closes #1422, by killing the initW keyword, in favour of init as used by the Conv layers.

Also fixes "in×out weight matrix" which was incorrect.

And makes Dense(rand(2,3), bias) work like Dense(3,2; bias), which again is like the Conv layers.

Edit -- also closes #1421 now, ensuring that the bias vectors of both Conv and Dense layers match the eltype of the weights.

PR Checklist

  • Tests are added
  • Entry in NEWS.md
  • Documentation, if applicable
  • Final review from @dhairyagandhi96 (for API changes).

@CarloLucibello
Copy link
Member

should we add init to Dense or add initW and initb to Conv?

@mcabbott
Copy link
Member Author

mcabbott commented Dec 26, 2020

I think simpler is better here. I can imagine wanting a different scaling for initW, but my guess is that using initb is vanishingly rare, and that looking up how it's meant to work and then constructing the function it wants is probably more effort than just passing the vector you want.

For example, there is not a single use of this in the model zoo. Although there are a few cases using initW=randn or some other custom function.

@CarloLucibello
Copy link
Member

agree, changing the bias init is very rare, so since we have to modify either Conv or Dense for consistency, better go with the simplest interface.

One more thing should be changed though: if the bias is not explicitly passed to the constructor, it should be created with the same eltype of the weights, while currently is always Float32. This would solve #1421

src/layers/basic.jl Outdated Show resolved Hide resolved
@CarloLucibello CarloLucibello added this to the v0.12 milestone Dec 27, 2020
@mcabbott
Copy link
Member Author

mcabbott commented Dec 27, 2020

if the bias is not explicitly passed to the constructor, it should be created with the same eltype of the weights, while currently is always Float32. This would solve #1421

Hadn't seen the issue. It sounds like we should change create_bias(bias, zeros, out) to sort this out everywhere, by passing it the eltype of the weights? Or better the whole array... It should (1) pass through false, or (2) create via similar(W,out) .= 0, or (3) convert a given vector to match eltype.

@CarloLucibello
Copy link
Member

CarloLucibello commented Dec 27, 2020

Hadn't seen the issue.

me neither, just found it scrolling open issues

It sounds like we should change create_bias(bias, zeros, out) to sort this out everywhere, by passing it the eltype of the weights? Or better the whole array... It should (1) pass through false, or (2) create via similar(W,out) .= 0, or (3) convert a given vector to match eltype.

The interface could be create_bias(bias, W, out) and then do (1) and (2) but not (3), I wouldn't interfere with explicitly passed biases

src/layers/basic.jl Outdated Show resolved Hide resolved
@mcabbott
Copy link
Member Author

mcabbott commented Dec 27, 2020

Oh I didn't look before I wrote this, sorry. I went for create_bias(W, bias, out). I do think we should convert, though, as I cannot see having mixed eltypes here as being anything but a mistake. If you wish to make a layer that promotes, then I think you should explicitly write that, and not rely on the broadcasting of bias to do it for you. The conversion could be made noisy, in case silently converting is potentially confusing?

@mcabbott
Copy link
Member Author

Adding a size check to create_bias found a number of mistakes in tests along the lines of Dense(rand(2,3), rand(3)) --- it's a bit confusing that this isn't Dense(2, 3, bias=rand(3)).

I am sure I've suggested this before, but what thoughts on Dense(2 => 3, bias=rand(3)) as a better constructor? This would match how Conv writes channels.

If we do this, I think Dense(in, out) should just silently work, but the printing etc, so as not to annoy everyone with old code, but the printing & docs could be updated.

src/utils.jl Outdated Show resolved Hide resolved
@CarloLucibello
Copy link
Member

I am sure I've suggested this before, but what thoughts on Dense(2 => 3, bias=rand(3)) as a better constructor? This would match how Conv writes channels.

I'm unsure about this: on one hand, Dense(2 => 3) is clearer, but Dense(2, 3) has been there forever and is much nicer to write. Since I can foresee an endless discussion about this, let's leave it out of this PR

@mcabbott
Copy link
Member Author

mcabbott commented Dec 27, 2020

OK. Then perhaps this is done, I think, modulo one doc typo I just saw.

It doesn't at present give a warning when converting bias to match the weight's eltype, I have this locally, but no strong feelings either way.

src/utils.jl Outdated Show resolved Hide resolved
CarloLucibello
CarloLucibello previously approved these changes Dec 27, 2020
@CarloLucibello
Copy link
Member

also recurrent layers have initb ...

@mcabbott
Copy link
Member Author

mcabbott commented Jan 1, 2021

Oh I didn't see the RNN ones. Not this PR, I think.

@ToucheSir
Copy link
Member

thoughts on Dense(2 => 3, bias=rand(3)) as a better constructor?

Worth noting that Conv already uses this syntax for in_channels => out_channels.

@mcabbott
Copy link
Member Author

mcabbott commented Jan 1, 2021

Worth noting that Conv already uses this

Yes, maybe I forgot to say that. Although I think of that as a plus, Dense treats the first dimension as channels. Is that what you meant too, or do you see a conflict of meanings?

@ToucheSir
Copy link
Member

I meant that while we're on the topic of inconsistencies between Dense, Conv and RNN constructors, might as well note that the syntax for specifying # of input and output features(/channels) is also inconsistent. e.g. PyTorch does separate in and out arguments for all 3.

@mcabbott
Copy link
Member Author

mcabbott commented Jan 1, 2021

Ok! I am very happy to make that change here. Or in another PR, this one is done I think, and perhaps better to do things one by one.

@ToucheSir
Copy link
Member

Yeah, ultimately it's more of a cosmetic change and I'm not qualified to comment on what the ROI for breaking the interface might be. We can revisit this in another issue :)

CarloLucibello
CarloLucibello previously approved these changes Jan 2, 2021
@darsnack
Copy link
Member

Bump?

@mcabbott
Copy link
Member Author

Thoughts on including #670 (comment), btw? Doesn't have to be now obviously but isn't a bad fit. I just didn't quite get the W conventions, right, I think.

@darsnack
Copy link
Member

I would say that since we support the ability to do that kind of initialization, this PR is still good to go. Things like the snippet you linked would be nice for an overhauled initialization section in the docs. That snippet addresses some particular conventions for Dense but the ideas go beyond Dense to other layers as well. It makes more sense to me to summarize all that in a docs section at a later point.

No strong feelings though.

@mcabbott
Copy link
Member Author

OK, yes maybe best not to delay this with any further additions.

@CarloLucibello
Copy link
Member

Yes, I think we should have that snippet somewhere to help reproducibility, not material for this PR though. Maybe we should have a "moving from pytorch" or a "differences with pytorch" section in the docs at some point

@CarloLucibello
Copy link
Member

another bump for this, needs just an @DhairyaLGandhi's approval for merge, let's please prioritize the v0.12 stuff

@CarloLucibello
Copy link
Member

@DhairyaLGandhi close your eyes and hit approve, we are going to be fine :)

@CarloLucibello
Copy link
Member

bors r+

bors bot added a commit that referenced this pull request Mar 5, 2021
1440: `Dense` keyword handling, and docstring r=CarloLucibello a=mcabbott

Closes #1422, by killing the `initW` keyword, in favour of `init` as used by the Conv layers. 

Also fixes "in×out weight matrix" which was incorrect. 

And makes `Dense(rand(2,3), bias)` work like `Dense(3,2; bias)`, which again is like the Conv layers.

Edit -- also closes #1421 now, ensuring that the bias vectors of both Conv and Dense layers match the eltype of the weights. 

### PR Checklist

- [x] Tests are added
- [x] Entry in NEWS.md
- [x] Documentation, if applicable
- [ ] Final review from `@dhairyagandhi96` (for API changes).


Co-authored-by: Michael Abbott <me@escbook>
@CarloLucibello
Copy link
Member

bors r+

bors bot added a commit that referenced this pull request Mar 5, 2021
1440: `Dense` keyword handling, and docstring r=CarloLucibello a=mcabbott

Closes #1422, by killing the `initW` keyword, in favour of `init` as used by the Conv layers. 

Also fixes "in×out weight matrix" which was incorrect. 

And makes `Dense(rand(2,3), bias)` work like `Dense(3,2; bias)`, which again is like the Conv layers.

Edit -- also closes #1421 now, ensuring that the bias vectors of both Conv and Dense layers match the eltype of the weights. 

### PR Checklist

- [x] Tests are added
- [x] Entry in NEWS.md
- [x] Documentation, if applicable
- [ ] Final review from `@dhairyagandhi96` (for API changes).


Co-authored-by: Michael Abbott <me@escbook>
@DhairyaLGandhi
Copy link
Member

DhairyaLGandhi commented Mar 6, 2021

Yeah, not comfortable with this change. I'm also not clear on what the exact benefit is, it seems like the field names etc change but was anyone asking them to be?

If all you wanted was the ability to call into dense with different symbols just do, getproperty(d::Dense, s::Symbol) = s == :weight ? d.W : getfield(d,s) etc.

Which is much more acceptable.

@DhairyaLGandhi
Copy link
Member

bors r+

@bors
Copy link
Contributor

bors bot commented Mar 6, 2021

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
5 participants