Unified, efficient code for mutable and immutable AbstractArrays #32

c42f · 2016-09-20T06:42:02Z

As mentioned in #31, nobody wants to write two versions of code, one for mutable arrays, vs one for immutable arrays like StaticArray. I'd like a central place where we identify the main issues, and discuss minimal changes to the AbstractArray interface or julia compiler which would allow packages like StaticArrays to exist without rewriting a lot of algorithms from Base. Here's a very incomplete list, I hope people will jump in with more detail:

API

similar() assumes a mutable container can be created efficiently filled in an iterative way. Julep: setfield! for mutable references to immutables JuliaLang/julia#17115 goes some way to solving this, though we'd be working with a statically sized buffer, something like Ref{T<:StaticArray} and would need to unwrap this to return the value.
The hvcat() API doesn't reflect the structure of the input expression AST in a way which can give a type inferred output. (Potentially solvable with constant prop, without API changes?)

Optimization

similar() with a size argument is not type inferrable for StaticArray. Constant prop in inference could potentially fix this in many cases, Ref. Do some constant propagation before type inference runs JuliaLang/julia#5560.
getindex() with fancy indexing is not type inferrable for StaticArray. Constant prop in inference could fix a bunch of important instances of this, eg literal ranges v[3:5] could be known to be a SVector{3}. But what about non-inferrable cases, even after constant prop? We'd like to return an Array in those cases, can we achieve this in principle?
Loop unrolling - small static arrays can benefit a lot from this, but in StaticArrays we're forced to do it at the AST level via @generated functions which no doubt pushes the compiler pretty hard and generates a heap of garbage. More interesting would be an @unroll meta to instruct codegen to unroll a given loop if the range is statically known.

Obviously one thing we can't fix is differences in the most appropriate algorithm for small vs large objects. For example, eigen decomposition of 2x2 matrices should probably just be done using a specialized analytical expression for that size.

The text was updated successfully, but these errors were encountered:

andyferris · 2016-09-20T07:20:40Z

getindex() with fancy indexing is not type inferrable for StaticArray... We'd like to return an Array in those cases, can we achieve this in principle?

Already done :)

m = @SMatrix rand(3,3)
m[1,:] # SVector{3}
m[(1,),:] # SMatrix{1,3}
m[[1],:] # 1x3 Matrix
rand(3)[(1,2,3)] # SVector{3}

Or at least that's the way it should work... I believe there are missing methods, especially beyond 2D.

c42f · 2016-09-20T07:41:27Z

Sure, that's great. What I was getting at is that if we did have constant propagation, we'd get into issues like:

v = @SVector [1,2,3]

v[1:2] # 1
v[range] # 2

For 1 we can potentially return SVector in a wonderful world of integrated const prop + inference.
For 2, what if range isn't known until runtime? How will this be consistent with the desire to return SVector in the case above?

Admittedly this is pretty specific and speculative. I'm sure there's some much more basic and important issues which should be in the list above.

andyferris · 2016-09-20T07:46:27Z

Thanks for opening this issue Chris.

I have thought about this deeply before. From memory the compiler changes necessary to make things nice are:

1: Have inlining run to convergence along with constant propagation and inference. In my mind this should just be a part of the main inference loop, but @JeffBezanson was worried this might get complex, while if they are iterated in-turn until convergence, it may slow down compilation too much.
2: The mentioned @unroll macro. It would work with any for loop over a constant iterable (literals as well as other Consts in inference), and would need to happen during or after inference or during codegen. Alternatively, we could have a heuristic for having the compiler do this automatically - but to my mind @inline, @noinline and default heuristics for inlining was the correct choice, so the same applies here.

There might have been others. These two (in addition to more robust constant prop and branch elimination), would (I think) remove all the @generated functions from StaticArrays.jl. It also solves the problem of calling similar(a, newsize) and having newsize go into an inferable type.

If we want fast mutable containers, then we need:

3: Some kind of mutable container that can live on the stack.

With this final one, I think we can fix all the API issues with JuliaLang/julia#17115. My idea is:

4: Have similar always return a Ref{ArrayType} and make sure people know to unwrap it once they have finished mutating it ("publish" it, so-to-speak, which is a term that came up in the JuliaCon talk on DistributedArrays).

I realize the last is pretty API-breaking... if that concept won't fly then at the least we could define similar_ref and make all the functions in Base use it (and advertise its existence to users). At this point StaticArrays.jl will be so small that we might try investigate integrating it into Base.

The final part of the API I would like to touch on is dispatch-by-size. If you want to do this with StaticArrays then we can do it with a trait/thunk:

f(x::StaticVector} = _f(Val{length(typeof(x))}, x)
_f(::Type{Val{3}}, x) = ... # code for size 3
_f(::Type{Val{4}}, x) = ... # code for size 4
_f{N}(::Type{Val{N}}, x) = error("Expected length 3 or 4")

However trying this with a standard Vector will not be type-stable or fast. I'm not sure how to integrate this... currently we can ignore dispatch and do:

function g(x::AbstractVector)
    if length(x) === 3
        # code for size 3
    elseif length(x) === 4
        # code for size 4
    else
        error("Expected length 3 or 4")
    end
end

but unfortunately a user couldn't later add a definition that makes sense for length 5.

Sorry - that was a bit of a disorganized mind-dump - I hope it makes some sense.

andyferris · 2016-09-20T07:50:25Z

For 1 we can potentially return SVector in a wonderful world of integrated const prop + inference.
For 2, what if range isn't known until runtime? How will this be consistent with the desire to return SVector in the case above?

I think we might need static ranges for that, e.g. SUnitRange or something. Kind-of disgusting.

On the other hand... if this occurs it is because the return type is inferable. If SVector was simply mutable then returning this wouldn't be a problem (well, unless someone tried to resize it, but that ability might disappear for Vector anyway).

andyferris · 2016-10-07T03:30:02Z

One idiom that I'm currently trying to work around is this sort of thing:

n = length(v)
v2 = similar(v, 2*n)

(or similarly, make a vector of length of the minimum dimension of a matrix, etc)

The 2*n will be a compile time constant soon (in v0.6) when the pure function/constant propagation stuff becomes more powerful. Currently length(typeof(v)) is pure for a static vector so that is OK. However, the similar function is not pure and won't return a constant - it has to allocate at runtime if v isn't isbits.

One thought is to have

@inline similar(SV::StaticVector, s::Int) = similar_type(SV, s)()
# etc

where similar_type is @pure. We need the iterated inlining and inference for that to work properly.

In summary, to make this work we would need to change in StaticArrays:

similar_type() becomes the type returned by similar() (which must define setindex!())
static_type() or something for the current behaviour of a type that takes values upon construction and may or may not define setindex!()

Again, we need this feature in the compiler

Iterated inlining, inference and constant prop

With these, more of the routines in Base.LinAlg would become type-stable. They still wouldn't be as fast as possible for small matrices, etc, but an improvement on the current situation.

KristofferC · 2016-10-07T05:49:11Z

Regarding loop unrolling. If the range is statically known LLVM will unroll the loop if its heuristic finds it beneficial to do so. So is there really a need for an @unroll macro?

andyferris · 2016-10-07T06:18:54Z

Umm you might be right but I did some benchmarking of matrix multiplication and explicit unrolling of up to thousands of individual multiply/add instructions in 3 nested loops was beneficial. I think LLVM would unroll the sum of a three vector well but nothing very complex (it would be too conservative).

I also suspect having optimización opportunities happen sooner in Julia would have other benefits. For example, you could iterate over tuple of mixed types and inference could make this type stable.

Finally having automatic heuristics is great but having optional user control might be even better (depending if it gets abused). Inlining is a great model here.

KristofferC · 2016-10-07T06:23:21Z

While unrolling might be beneficial for micro benchmarks it might not always carry over to more realistic scenarios due to icache misses and the instructions pushing out other hot data from L2 cache. But yeah, user control is probably good (and maybe not too hard to implement?)

c42f · 2016-10-07T07:48:06Z

Good point about icache pollution. I'm be a bit doubtful about the performance benefit of fully unrolling vectors larger than size 20 or something in real code (warning: I'm pulling "20" out of thin air here - don't take it seriously, run a benchmark instead ;-) ).

I don't know the StaticArrays implementation that well yet, but in some of my FixedSizeArrays patches I ended up doing some @generated unrolling of arithmetic into tuples simply so that the compiler would do more of the type inference job for me. This avoided a lot of the mess which is handled elsewhere using promote_op, etc. Was this required @andyferris, or did you get around it some other way in StaticArrays?

c42f · 2019-08-01T00:14:38Z

I think I'll close this. There's a bit of interesting discussion here of issues we never completely solved, but I think the compiler has moved on enough that we'd need to take a completely fresh look at the problems.

c42f mentioned this issue Nov 2, 2016

inlined sum, sumabs2, mean, and similar? #64

Closed

c42f mentioned this issue Jun 5, 2017

Creating a new array with a different element type? #217

Closed

tpapp mentioned this issue Jun 24, 2019

export & document SUnitRange #620

Open

c42f added the design speculative design related issue label Aug 1, 2019

c42f closed this as completed Aug 1, 2019

c42f mentioned this issue Oct 22, 2019

Release 0.12 #675

Closed

oschulz pushed a commit to oschulz/StaticArrays.jl that referenced this issue Apr 4, 2023

respect array type (JuliaArrays#32)

23b4400

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unified, efficient code for mutable and immutable AbstractArrays #32

Unified, efficient code for mutable and immutable AbstractArrays #32

c42f commented Sep 20, 2016

andyferris commented Sep 20, 2016

c42f commented Sep 20, 2016

andyferris commented Sep 20, 2016

andyferris commented Sep 20, 2016

andyferris commented Oct 7, 2016

KristofferC commented Oct 7, 2016

andyferris commented Oct 7, 2016

KristofferC commented Oct 7, 2016 •

edited

Loading

c42f commented Oct 7, 2016

c42f commented Aug 1, 2019 •

edited

Loading

Unified, efficient code for mutable and immutable AbstractArrays #32

Unified, efficient code for mutable and immutable AbstractArrays #32

Comments

c42f commented Sep 20, 2016

API

Optimization

andyferris commented Sep 20, 2016

c42f commented Sep 20, 2016

andyferris commented Sep 20, 2016

andyferris commented Sep 20, 2016

andyferris commented Oct 7, 2016

KristofferC commented Oct 7, 2016

andyferris commented Oct 7, 2016

KristofferC commented Oct 7, 2016 • edited Loading

c42f commented Oct 7, 2016

c42f commented Aug 1, 2019 • edited Loading

KristofferC commented Oct 7, 2016 •

edited

Loading

c42f commented Aug 1, 2019 •

edited

Loading