Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Allow constant fields in mutable types #11430

Closed
wants to merge 2 commits into from
Closed

Conversation

mbauman
Copy link
Sponsor Member

@mbauman mbauman commented May 24, 2015

This was a fun afternoon hack. There's still a few odd ends to wrap up, but I was surprised how easy this turned out to be. I want to test the waters here… is this something that is wanted? It would close #9448, and it trivially enables the sorts of optimizations I am seeking in #9974 if you're able to mark the field constant. Upon marking BitArray's chunks array as constant, it gives the same sorts of speedups I saw in #11403 (but more systematically and without going through contortions or having troubles with pointer lookups within loops — the BitArray code could be simplified more now, too).

Some todos:

  • Is this a feature we want? Are there advantages to this over an immutable type with mutable references?
  • Better error messages for re-assignment attempts
  • This allows const declarations without an assignment everywhere. In the global scope it behaves how you'd expect (allowing the first assignment but no more), and in the local scope that currently isn't enforced anyway. But f() = const x segfaults. It should probably have the same error message as f() = global x.
  • Some code cleanup and renaming. It's currently setting field[i].isconst within jl_compute_field_offsets… which isn't really the right name or place to do this.
  • This steals a bit from the maximum field size, reducing it to 2^14-1. This is really the only place where this might be breaking. Is that ok? I'm not sure if there's really a sensible alternative.

Further projects:

@mbauman
Copy link
Sponsor Member Author

mbauman commented May 24, 2015

cc: @carlobaldassi. I also played with making BitArrays immutable, and while it sped a few operations up, the extra pointer dereference and cache non-locality in having a mutable Ref for length really killed performance in general. (Edit: see below) See the mb/frozenbits branch on my fork.

@JeffBezanson
Copy link
Sponsor Member

This seems to be done well.

I think the more usual way to get this functionality is to use mutable Ref cells inside immutable objects. That is much simpler, since you only need the concept of mutability that already exists, instead of adding more features to the data model.

@mbauman
Copy link
Sponsor Member Author

mbauman commented May 24, 2015

Admittedly, I didn't profile where the extra time was getting spent on my attempt at doing that (see my simultaneous comment above). Perhaps it'd be worth my while to try optimizing an immutable BitArray further.

@JeffBezanson
Copy link
Sponsor Member

You would need to use RefValue, since Ref is an abstract type. I do find that naming scheme pretty unfortunate.

@mbauman
Copy link
Sponsor Member Author

mbauman commented May 24, 2015

Aha, I'll give that a shot.

@carnaval
Copy link
Contributor

Very cool. I think it would be cleaner if eventually the immutable keyword would only make the frontend rewrite the type body by adding const on every field. We would then flag the type as immutable iff all fields are const.

@JeffBezanson
Copy link
Sponsor Member

@vtjnash Can we rename RefValue to Ref to make writing mutable fields nicer? We've generally tried to move away from using nice short names for abstract types. We could call the abstract version ByRef, or give cconvert to Ref special behavior.

@mbauman
Copy link
Sponsor Member Author

mbauman commented May 24, 2015

Ah, sure enough, that's much better. I don't have much more time to play with it today, but running the BitArray tests on that branch is now just slightly slower than Master. If I can hammer out a few regressions it'll probably be even better for my IntSets refactor (and DataArrays, too) since an immutable BitArray will be stored inline. I'll play with it more later. 👍

@vtjnash
Copy link
Sponsor Member

vtjnash commented May 24, 2015

@JeffBezanson in a trend adopted from my work on Gtk.jl (specifically the Mutable experiment hat formed the basis for the eventual Ref PR, I tend to want to encourage users to refrain from ever referring to the concrete RefValue type an instead work with the abstract concept of a murable reference. I realize that this bucks the trend of ensure that every field is concretely typed, but I'm not convinced that is not simply a premature optimization in such cases.

@sjkelly
Copy link
Contributor

sjkelly commented May 25, 2015

I wonder if it would be useful to dispatch if a field is const. Maybe there could be a const type, Const(5) = Const{5}().

@JeffBezanson
Copy link
Sponsor Member

@vtjnash Yes there are 2 concerns: we want to use the abstract type for ccall and uses like Ref(x), but we want to use a concrete type for mutable fields. We need to balance those.

How many kinds of references are there, really? In almost every case you want a mutable single-value cell (RefValue). The only exception seems to be Array. We could make passing an Array as a Ref argument to ccall a special case using cconvert.

@mbauman
Copy link
Sponsor Member Author

mbauman commented May 25, 2015

Alright, I spent a little more time this morning comparing this approach to the immutable + Ref approach in mb/frozenbits. As far as I can tell, there's about a 3x performance regression involved in the creation of an immutable BitVector with its mutable Ref as compared to just allocating one mutable type. If that can be improved, then I think const fields may not be needed.

Edit: I suppose that this is a constant overhead that is only significant on smaller BitArrays. The 3x number above was for a small BitArray of < 64 elements. With larger BitArrays the cost is overwhelmed by the array allocation.

@carlobaldassi
Copy link
Member

I think the more usual way to get this functionality is to use mutable Ref cells inside immutable objects. That is much simpler, since you only need the concept of mutability that already exists, instead of adding more features to the data model.

Yet, comparing the implementations, it seems that the const-field approach is simpler from the user's perspective. For example, in the immutable BitArray case one needs two extra parameters to the type which are essentially only an implementation detail. Plus, BitArray{N} would no longer be a concrete type (even though the common cases of BitVector and BitMatrix are).

Performance-wise, I don't know if the regressions which @mbauman observes are solvable. However, I was thinking about code like this:

B = BitArray(N)
for i in mylistofindices
    x += B[i]
end

i.e. without @inbounds because you don't necessarily control mylistofindices. Intuitively, it would seem that the getindex call would need to check the len field all the time, which would require to dereference the pointer, and I doubt that it could be hoisted out of the loop by the compiler. In the const-field approach, however, it seems much easier for the compiler to figure out that both len and chunks are not changing. Am I making any sense?

@JeffBezanson
Copy link
Sponsor Member

What would the two extra type parameters be? I don't quite get that.

@mbauman
Copy link
Sponsor Member Author

mbauman commented May 25, 2015

What would the two extra type parameters be? I don't quite get that.

It's a just a hack to only have mutable lengths on BitVectors while concretely typing all fields. Really, this is crying for a @generated type. See my comments here: https://github.com/mbauman/julia/blob/mb/frozenbits/base/bitarray.jl#L7-L10

@mbauman
Copy link
Sponsor Member Author

mbauman commented May 25, 2015

Effectively, it's

@generated immutable BitArray{N}
    quote
        chunks::Vector{UInt64}
        len::$(N == 1 ? RefValue{Int} : Int)
        dims::$(N == 1 ? Void : NTuple{N, Int})
        ...
    end
end

@JeffBezanson
Copy link
Sponsor Member

That's not really fair --- it's a feature that you can parameterize over whether something is a RefValue. AFAICT you can't do that with the const declaration. However I agree this is not really desirable; the extra parameters are ugly and all code for the type has to handle that variation.

@mbauman
Copy link
Sponsor Member Author

mbauman commented May 25, 2015

That is true, I'm not selectively marking the len field constant for non-Vectors in this PR (which, interestingly enough, seems rather insane at first glance… even though it was the first optimization I made on the immutable branch and is effectively the same thing).

@StefanKarpinski
Copy link
Sponsor Member

@vtjnash: Fwiw, I also found the Ref vs RefValue business confusions and unintuitive when I first started playing with it. I sympathize with wanting to encourage people to program to abstractions rather than conceptions. That's why AbstractArray was originally called Tensor and AbstractVector was Vector, etc. Dense versions were called Array and DenseVector. That didn't work out well and we changed it before anyone but Jeff, Viral and I had used it. To me this Ref and RefValue naming is the same story all over again. Bottom line, I think that simple concrete names should be used for simple concrete things; programming correctly to an abstraction requires awareness and care, so having a name that reminds you of that is a good thing.

@andyferris
Copy link
Member

We've generally tried to move away from using nice short names for abstract types. We could call the abstract version ByRef

@JeffBezanson Wouldn't the first sentence imply that the abstract type could be Reference, withRef as the concrete (and RefArray as another concrete)?

I've also been confused/bitten by trying to use Ref as a concrete type, and I've seen the same problem in other people's code. PS - will RefArray go away when/if Buffer is implemented and Array is fully-Julian?

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 6, 2016

You generally shouldn't need to be be coding against the subtypes of Ref, since the abstract behavior is more useful than it's concrete implementations. And in fact, the behavior needs to be abstract, since the behavior of the concrete object is different, and already defined by the language, and generally not what you want.

No, the existence of RefArray isn't due to how Array is implemented, it is required by the definition of Ref. Its implementation might change, but I wouldn't really expect it to, and that's internal anyways.

@andyferris
Copy link
Member

OK, thanks Jameson. I should try and understand RefArray a little better. What I meant to say is that sometimes I want to use something along the lines of

immutable MyType
    a::Int
    b::Ref{Int}
end

and if you want to use (faster) concrete types you need to realize if you actually want RefValue here. This pattern will be quite useful if/when such an immutable gets allocated on the stack (as a struct with an Int and a pointer).

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 6, 2016

there's nothing wrong with a field type of Ref{Int}

@andyferris
Copy link
Member

Really? That would be nice if they are equally economical. I'm curious - are these two are completely the same in the way the binary data is structured, the layers of pointers, etc? There's no boxing with Ref? (or there is boxing with RefValue?)

immutable MyType1
    a::Int
    b::RefValue{Int}
end

immutable MyType2
    a::Int
    b::Ref{Int}
end

@yuyichao
Copy link
Contributor

yuyichao commented Sep 7, 2016

Memory layout and boxing is not the issue, none of them are inlined and (in another word) both are boxed. Ref{Int} does have type instability.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 7, 2016

right, but that doesn't indicate a problem. Ref already guarantees out-of-line boxing (that's why it exists), and with all of the dispatch caches and such, the generic dispatch shouldn't cost much more or less than that required allocation. So the intent is that you are better off not pre-optimizing for it. If you want to make it faster, in v0.6, it may actually be better to back it with a RefArray (implying you are doing the memory management yourself) optimizing for reduced allocation, or by passing around just the Ref (allowing function specialization to take effect) optimizing for reduced pointer chasing – which one is better will depend on the usage profile of the application (lots of Ref or lots of loops). And then it turns out that this pre-optimization of the field would have restricted the user from switching to a better algorithm later.

@kshyatt kshyatt added the domain:types and dispatch Types, subtyping and method dispatch label Sep 7, 2016
@yuyichao
Copy link
Contributor

yuyichao commented Sep 7, 2016

and with all of the dispatch caches and such, the generic dispatch shouldn't cost much more or less than that required allocation.

Even if the dynamic dispatch is always as fast, which is not always the case, it still prevent inlining and prevent LLVM from seeing the memory operation. I'll be surprised if we can make this claim (that type instability for mutable types won't cause performance issue) in general.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 7, 2016

Perhaps not, but you can't make that argument about type stability either, in general. As I said above in my wall of text, if the performance issue would be fixed by getting LLVM to see the memory operation, then it's presumably even better to unwrap the type earlier and avoid the extra pointer indirections (and eventually doing MemSSA and allocation-elimination in type inference should handle that automatically).

Alternatively, as a Ref, you could be storing a Ptr there (so that there's no allocation on the convert pass at all, and forcing the memory management fully manual) or a RefArray (so that the memory allocation for the boxes is pooled, and memory management is partly manual).

@yuyichao
Copy link
Contributor

yuyichao commented Sep 7, 2016

allocation-elimination in type inference should handle that automatically

Right, this is much harder if the field type is not a leaf type.

@StefanKarpinski StefanKarpinski added this to the 0.6.0 milestone Sep 13, 2016
@pabloferz
Copy link
Contributor

Another reason for renaming Ref and RefValue would be #18965.

@pabloferz pabloferz mentioned this pull request Oct 17, 2016
@StefanKarpinski StefanKarpinski modified the milestones: 1.0, 0.6.0 Dec 15, 2016
@JeffBezanson JeffBezanson modified the milestones: 2.0+, 1.0 May 2, 2017
@StefanKarpinski StefanKarpinski modified the milestones: 2.0, 1.x Jan 22, 2019
@mbauman
Copy link
Sponsor Member Author

mbauman commented May 1, 2019

This likely isn't going to be the direction forward here and I'm almost certainly not going to be the one to take the next steps here so I'll close this.

@mbauman mbauman closed this May 1, 2019
@mbauman mbauman deleted the mb/consttypefields branch May 1, 2019 15:21
vtjnash added a commit that referenced this pull request Dec 2, 2021
Mark some builtin types also, although Serialization relies upon being
able to mutilate the Method objects, so we do not yet mark those.

Replaces #11430

Co-authored-by: Matt Bauman <[email protected]>
vtjnash added a commit that referenced this pull request Dec 10, 2021
Mark some builtin types also, although Serialization relies upon being
able to mutilate the Method objects, so we do not yet mark those.

Replaces #11430

Co-authored-by: Matt Bauman <[email protected]>
vtjnash added a commit that referenced this pull request Dec 17, 2021
Mark some builtin types also, although Serialization relies upon being
able to mutilate the Method objects, so we do not yet mark those.

Replaces #11430

Co-authored-by: Matt Bauman <[email protected]>
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
Mark some builtin types also, although Serialization relies upon being
able to mutilate the Method objects, so we do not yet mark those.

Replaces JuliaLang#11430

Co-authored-by: Matt Bauman <[email protected]>
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
Mark some builtin types also, although Serialization relies upon being
able to mutilate the Method objects, so we do not yet mark those.

Replaces JuliaLang#11430

Co-authored-by: Matt Bauman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:types and dispatch Types, subtyping and method dispatch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ability to mark fields of types as immutable-const