WIP: Allow constant fields in mutable types #11430

mbauman · 2015-05-24T20:55:50Z

This was a fun afternoon hack. There's still a few odd ends to wrap up, but I was surprised how easy this turned out to be. I want to test the waters here… is this something that is wanted? It would close #9448, and it trivially enables the sorts of optimizations I am seeking in #9974 if you're able to mark the field constant. Upon marking BitArray's chunks array as constant, it gives the same sorts of speedups I saw in #11403 (but more systematically and without going through contortions or having troubles with pointer lookups within loops — the BitArray code could be simplified more now, too).

Some todos:

Is this a feature we want? Are there advantages to this over an immutable type with mutable references?
Better error messages for re-assignment attempts
This allows const declarations without an assignment everywhere. In the global scope it behaves how you'd expect (allowing the first assignment but no more), and in the local scope that currently isn't enforced anyway. But f() = const x segfaults. It should probably have the same error message as f() = global x.
Some code cleanup and renaming. It's currently setting field[i].isconst within jl_compute_field_offsets… which isn't really the right name or place to do this.
This steals a bit from the maximum field size, reducing it to 2^14-1. This is really the only place where this might be breaking. Is that ok? I'm not sure if there's really a sensible alternative.

Further projects:

In ability to mark fields of types as immutable-const #9448 (comment) @vtjnash suggests that this may allow some other general code sharing between isbits and immutable, but I don't understand src well enough yet to grasp what that entails.

mbauman · 2015-05-24T21:03:06Z

cc: @carlobaldassi. I also played with making BitArrays immutable, and while it sped a few operations up, ~~the extra pointer dereference and cache non-locality in having a mutable Ref for length really killed performance in general.~~ (Edit: see below) See the mb/frozenbits branch on my fork.

JeffBezanson · 2015-05-24T21:03:38Z

This seems to be done well.

I think the more usual way to get this functionality is to use mutable Ref cells inside immutable objects. That is much simpler, since you only need the concept of mutability that already exists, instead of adding more features to the data model.

mbauman · 2015-05-24T21:06:16Z

Admittedly, I didn't profile where the extra time was getting spent on my attempt at doing that (see my simultaneous comment above). Perhaps it'd be worth my while to try optimizing an immutable BitArray further.

JeffBezanson · 2015-05-24T21:06:51Z

You would need to use RefValue, since Ref is an abstract type. I do find that naming scheme pretty unfortunate.

mbauman · 2015-05-24T21:07:59Z

Aha, I'll give that a shot.

carnaval · 2015-05-24T21:12:23Z

Very cool. I think it would be cleaner if eventually the immutable keyword would only make the frontend rewrite the type body by adding const on every field. We would then flag the type as immutable iff all fields are const.

JeffBezanson · 2015-05-24T21:12:57Z

@vtjnash Can we rename RefValue to Ref to make writing mutable fields nicer? We've generally tried to move away from using nice short names for abstract types. We could call the abstract version ByRef, or give cconvert to Ref special behavior.

mbauman · 2015-05-24T21:22:53Z

Ah, sure enough, that's much better. I don't have much more time to play with it today, but running the BitArray tests on that branch is now just slightly slower than Master. If I can hammer out a few regressions it'll probably be even better for my IntSets refactor (and DataArrays, too) since an immutable BitArray will be stored inline. I'll play with it more later. 👍

vtjnash · 2015-05-24T22:21:43Z

@JeffBezanson in a trend adopted from my work on Gtk.jl (specifically the Mutable experiment hat formed the basis for the eventual Ref PR, I tend to want to encourage users to refrain from ever referring to the concrete RefValue type an instead work with the abstract concept of a murable reference. I realize that this bucks the trend of ensure that every field is concretely typed, but I'm not convinced that is not simply a premature optimization in such cases.

sjkelly · 2015-05-25T03:09:45Z

I wonder if it would be useful to dispatch if a field is const. Maybe there could be a const type, Const(5) = Const{5}().

JeffBezanson · 2015-05-25T04:18:34Z

@vtjnash Yes there are 2 concerns: we want to use the abstract type for ccall and uses like Ref(x), but we want to use a concrete type for mutable fields. We need to balance those.

How many kinds of references are there, really? In almost every case you want a mutable single-value cell (RefValue). The only exception seems to be Array. We could make passing an Array as a Ref argument to ccall a special case using cconvert.

mbauman · 2015-05-25T14:56:57Z

Alright, I spent a little more time this morning comparing this approach to the immutable + Ref approach in mb/frozenbits. As far as I can tell, there's about a 3x performance regression involved in the creation of an immutable BitVector with its mutable Ref as compared to just allocating one mutable type. If that can be improved, then I think const fields may not be needed.

Edit: I suppose that this is a constant overhead that is only significant on smaller BitArrays. The 3x number above was for a small BitArray of < 64 elements. With larger BitArrays the cost is overwhelmed by the array allocation.

carlobaldassi · 2015-05-25T15:44:18Z

I think the more usual way to get this functionality is to use mutable Ref cells inside immutable objects. That is much simpler, since you only need the concept of mutability that already exists, instead of adding more features to the data model.

Yet, comparing the implementations, it seems that the const-field approach is simpler from the user's perspective. For example, in the immutable BitArray case one needs two extra parameters to the type which are essentially only an implementation detail. Plus, BitArray{N} would no longer be a concrete type (even though the common cases of BitVector and BitMatrix are).

Performance-wise, I don't know if the regressions which @mbauman observes are solvable. However, I was thinking about code like this:

B = BitArray(N)
for i in mylistofindices
    x += B[i]
end

i.e. without @inbounds because you don't necessarily control mylistofindices. Intuitively, it would seem that the getindex call would need to check the len field all the time, which would require to dereference the pointer, and I doubt that it could be hoisted out of the loop by the compiler. In the const-field approach, however, it seems much easier for the compiler to figure out that both len and chunks are not changing. Am I making any sense?

JeffBezanson · 2015-05-25T15:57:08Z

What would the two extra type parameters be? I don't quite get that.

mbauman · 2015-05-25T16:00:23Z

What would the two extra type parameters be? I don't quite get that.

It's a just a hack to only have mutable lengths on BitVectors while concretely typing all fields. Really, this is crying for a @generated type. See my comments here: https://github.com/mbauman/julia/blob/mb/frozenbits/base/bitarray.jl#L7-L10

mbauman · 2015-05-25T16:09:36Z

Effectively, it's

@generated immutable BitArray{N}
    quote
        chunks::Vector{UInt64}
        len::$(N == 1 ? RefValue{Int} : Int)
        dims::$(N == 1 ? Void : NTuple{N, Int})
        ...
    end
end

JeffBezanson · 2015-05-25T16:12:58Z

That's not really fair --- it's a feature that you can parameterize over whether something is a RefValue. AFAICT you can't do that with the const declaration. However I agree this is not really desirable; the extra parameters are ugly and all code for the type has to handle that variation.

mbauman · 2015-05-25T16:18:58Z

That is true, I'm not selectively marking the len field constant for non-Vectors in this PR (which, interestingly enough, seems rather insane at first glance… even though it was the first optimization I made on the immutable branch and is effectively the same thing).

StefanKarpinski · 2015-05-25T18:00:25Z

@vtjnash: Fwiw, I also found the Ref vs RefValue business confusions and unintuitive when I first started playing with it. I sympathize with wanting to encourage people to program to abstractions rather than conceptions. That's why AbstractArray was originally called Tensor and AbstractVector was Vector, etc. Dense versions were called Array and DenseVector. That didn't work out well and we changed it before anyone but Jeff, Viral and I had used it. To me this Ref and RefValue naming is the same story all over again. Bottom line, I think that simple concrete names should be used for simple concrete things; programming correctly to an abstraction requires awareness and care, so having a name that reminds you of that is a good thing.

andyferris · 2016-09-06T09:19:48Z

We've generally tried to move away from using nice short names for abstract types. We could call the abstract version ByRef

@JeffBezanson Wouldn't the first sentence imply that the abstract type could be Reference, withRef as the concrete (and RefArray as another concrete)?

I've also been confused/bitten by trying to use Ref as a concrete type, and I've seen the same problem in other people's code. PS - will RefArray go away when/if Buffer is implemented and Array is fully-Julian?

vtjnash · 2016-09-06T18:55:32Z

You generally shouldn't need to be be coding against the subtypes of Ref, since the abstract behavior is more useful than it's concrete implementations. And in fact, the behavior needs to be abstract, since the behavior of the concrete object is different, and already defined by the language, and generally not what you want.

No, the existence of RefArray isn't due to how Array is implemented, it is required by the definition of Ref. Its implementation might change, but I wouldn't really expect it to, and that's internal anyways.

andyferris · 2016-09-06T23:28:47Z

OK, thanks Jameson. I should try and understand RefArray a little better. What I meant to say is that sometimes I want to use something along the lines of

immutable MyType
    a::Int
    b::Ref{Int}
end

and if you want to use (faster) concrete types you need to realize if you actually want RefValue here. This pattern will be quite useful if/when such an immutable gets allocated on the stack (as a struct with an Int and a pointer).

vtjnash · 2016-09-06T23:53:36Z

there's nothing wrong with a field type of Ref{Int}

andyferris · 2016-09-07T01:06:27Z

Really? That would be nice if they are equally economical. I'm curious - are these two are completely the same in the way the binary data is structured, the layers of pointers, etc? There's no boxing with Ref? (or there is boxing with RefValue?)

immutable MyType1
    a::Int
    b::RefValue{Int}
end

immutable MyType2
    a::Int
    b::Ref{Int}
end

yuyichao · 2016-09-07T01:10:57Z

Memory layout and boxing is not the issue, none of them are inlined and (in another word) both are boxed. Ref{Int} does have type instability.

vtjnash · 2016-09-07T02:31:59Z

right, but that doesn't indicate a problem. Ref already guarantees out-of-line boxing (that's why it exists), and with all of the dispatch caches and such, the generic dispatch shouldn't cost much more or less than that required allocation. So the intent is that you are better off not pre-optimizing for it. If you want to make it faster, in v0.6, it may actually be better to back it with a RefArray (implying you are doing the memory management yourself) optimizing for reduced allocation, or by passing around just the Ref (allowing function specialization to take effect) optimizing for reduced pointer chasing – which one is better will depend on the usage profile of the application (lots of Ref or lots of loops). And then it turns out that this pre-optimization of the field would have restricted the user from switching to a better algorithm later.

yuyichao · 2016-09-07T08:14:48Z

and with all of the dispatch caches and such, the generic dispatch shouldn't cost much more or less than that required allocation.

Even if the dynamic dispatch is always as fast, which is not always the case, it still prevent inlining and prevent LLVM from seeing the memory operation. I'll be surprised if we can make this claim (that type instability for mutable types won't cause performance issue) in general.

vtjnash · 2016-09-07T15:20:09Z

Perhaps not, but you can't make that argument about type stability either, in general. As I said above in my wall of text, if the performance issue would be fixed by getting LLVM to see the memory operation, then it's presumably even better to unwrap the type earlier and avoid the extra pointer indirections (and eventually doing MemSSA and allocation-elimination in type inference should handle that automatically).

Alternatively, as a Ref, you could be storing a Ptr there (so that there's no allocation on the convert pass at all, and forcing the memory management fully manual) or a RefArray (so that the memory allocation for the boxes is pooled, and memory management is partly manual).

yuyichao · 2016-09-07T15:23:25Z

allocation-elimination in type inference should handle that automatically

Right, this is much harder if the field type is not a leaf type.

pabloferz · 2016-10-16T13:52:29Z

Another reason for renaming Ref and RefValue would be #18965.

mbauman · 2019-05-01T15:21:29Z

This likely isn't going to be the direction forward here and I'm almost certainly not going to be the one to take the next steps here so I'll close this.

Mark some builtin types also, although Serialization relies upon being able to mutilate the Method objects, so we do not yet mark those. Replaces #11430 Co-authored-by: Matt Bauman <[email protected]>

Mark some builtin types also, although Serialization relies upon being able to mutilate the Method objects, so we do not yet mark those. Replaces JuliaLang#11430 Co-authored-by: Matt Bauman <[email protected]>

mbauman added 2 commits May 24, 2015 16:40

Allow const declarations, read and store const field info

16c96c2

Make the BitArray chunks field constant

dc300d7

This was referenced Jul 29, 2015

Improve prime sieve performance #12025

Merged

Introduce Buffer type and make Array an abstraction on top of it #12447

Closed

tkelman mentioned this pull request May 18, 2016

RFC: make SparseMatrixCSC immutable #16371

Merged

kshyatt added the domain:types and dispatch Types, subtyping and method dispatch label Sep 7, 2016

StefanKarpinski added this to the 0.6.0 milestone Sep 13, 2016

pabloferz mentioned this pull request Oct 17, 2016

Renaming Ref #18990

Closed

StefanKarpinski modified the milestones: 1.0, 0.6.0 Dec 15, 2016

JeffBezanson modified the milestones: 2.0+, 1.0 May 2, 2017

vtjnash referenced this pull request Nov 2, 2018

make dict immutable

b2c8c63

StefanKarpinski modified the milestones: 2.0, 1.x Jan 22, 2019

mbauman closed this May 1, 2019

mbauman deleted the mb/consttypefields branch May 1, 2019 15:21

mbauman mentioned this pull request Aug 23, 2019

Speed up scalar BitArray indexing by ~25% #11403

Closed

KristofferC mentioned this pull request Aug 26, 2020

Mutable fields, not structs? #37216

Closed

vtjnash mentioned this pull request Dec 2, 2021

Allow const declarations on mutable fields #43305

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: Allow constant fields in mutable types #11430

WIP: Allow constant fields in mutable types #11430

mbauman commented May 24, 2015

mbauman commented May 24, 2015

JeffBezanson commented May 24, 2015

mbauman commented May 24, 2015

JeffBezanson commented May 24, 2015

mbauman commented May 24, 2015

carnaval commented May 24, 2015

JeffBezanson commented May 24, 2015

mbauman commented May 24, 2015

vtjnash commented May 24, 2015

sjkelly commented May 25, 2015

JeffBezanson commented May 25, 2015

mbauman commented May 25, 2015

carlobaldassi commented May 25, 2015

JeffBezanson commented May 25, 2015

mbauman commented May 25, 2015

mbauman commented May 25, 2015

JeffBezanson commented May 25, 2015

mbauman commented May 25, 2015

StefanKarpinski commented May 25, 2015

andyferris commented Sep 6, 2016

vtjnash commented Sep 6, 2016 •

edited

Loading

andyferris commented Sep 6, 2016

vtjnash commented Sep 6, 2016

andyferris commented Sep 7, 2016

yuyichao commented Sep 7, 2016

vtjnash commented Sep 7, 2016

yuyichao commented Sep 7, 2016

vtjnash commented Sep 7, 2016

yuyichao commented Sep 7, 2016

pabloferz commented Oct 16, 2016

mbauman commented May 1, 2019

WIP: Allow constant fields in mutable types #11430

WIP: Allow constant fields in mutable types #11430

Conversation

mbauman commented May 24, 2015

mbauman commented May 24, 2015

JeffBezanson commented May 24, 2015

mbauman commented May 24, 2015

JeffBezanson commented May 24, 2015

mbauman commented May 24, 2015

carnaval commented May 24, 2015

JeffBezanson commented May 24, 2015

mbauman commented May 24, 2015

vtjnash commented May 24, 2015

sjkelly commented May 25, 2015

JeffBezanson commented May 25, 2015

mbauman commented May 25, 2015

carlobaldassi commented May 25, 2015

JeffBezanson commented May 25, 2015

mbauman commented May 25, 2015

mbauman commented May 25, 2015

JeffBezanson commented May 25, 2015

mbauman commented May 25, 2015

StefanKarpinski commented May 25, 2015

andyferris commented Sep 6, 2016

vtjnash commented Sep 6, 2016 • edited Loading

andyferris commented Sep 6, 2016

vtjnash commented Sep 6, 2016

andyferris commented Sep 7, 2016

yuyichao commented Sep 7, 2016

vtjnash commented Sep 7, 2016

yuyichao commented Sep 7, 2016

vtjnash commented Sep 7, 2016

yuyichao commented Sep 7, 2016

pabloferz commented Oct 16, 2016

mbauman commented May 1, 2019

vtjnash commented Sep 6, 2016 •

edited

Loading