Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Refactor indexing to use Varargs and ReshapedArrays #16251

Merged
merged 8 commits into from
May 12, 2016
Merged

Conversation

mbauman
Copy link
Sponsor Member

@mbauman mbauman commented May 7, 2016

This puts some of the amazing new infrastructure through its paces, on our way towards #14770. Using a combination of Vararg and ReshapedArrays, we can always ensure that scalar, non-scalar, and view indexing all get indexed at an optimal dimensionality. This is slightly more verbose than would otherwise be necessary — it explicitly defines separate methods to support the general linear indexing cases that are on their way towards deprecation.

TODO:

  • @nanosoldier runbenchmarks(ALL)
  • Bikeshed the reshape(A, Val{N}) method. The current implementation name is just a stop-gap… as I was completing this I was thinking about doing something like this:
typealias Ravel{N, T} ReshapedArray{T, N}
(::Ravel{N}){N}(::AbstractArray) = # do the reshaping to N dimensions

@mbauman mbauman force-pushed the mb/varargs branch 2 times, most recently from 37233cd to d2ac8f6 Compare May 7, 2016 19:05
## LinearFast Scalar indexing: canonical method is one Int
_getindex(::LinearFast, A::AbstractArray, ::Int) = error("indexing not defined for ", typeof(A))
_getindex(::LinearFast, A::AbstractArray, i::Real) = (@_propagate_inbounds_meta; getindex(A, to_index(i)))
function _getindex{T,N}(::LinearFast, A::AbstractArray{T,N}, I::Vararg{Real,N})
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My impression is that this is still slow without @inline, because of the lack of call specialization (which will hopefully happen someday, see #16159).

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, nvm, next line 😦

@timholy
Copy link
Sponsor Member

timholy commented May 7, 2016

An awe-inspiring simplification---I really like how you've used the new reshape to such tremendously good effect. 💯

@mbauman
Copy link
Sponsor Member Author

mbauman commented May 7, 2016

I'm still having difficulty getting SubArray to play along with this whole scheme. I think I finally got something that will pass tests now… it'll be very interesting to see what it's performance is like, since we end up with ridiculously complicated nested structures from taking linear views into SubArrays.

@mbauman
Copy link
Sponsor Member Author

mbauman commented May 7, 2016

Does Nanosoldier hear me from within a bulleted list? Let's try it on a line by itself:

@nanosoldier runbenchmarks(ALL, vs = "JuliaLang/julia:master")

@kshyatt kshyatt added the domain:arrays [a, r, r, a, y, s] label May 7, 2016
@jrevels
Copy link
Member

jrevels commented May 7, 2016

Sorry, Nanosoldier isn't back online yet. I'm bringing it back up tomorrow, I'll trigger the build here once it's available.

@timholy
Copy link
Sponsor Member

timholy commented May 7, 2016

Looks like you're almost there. The BitArray tests are definitely hard-core when it comes to specifying the return types. The same issue came up in #15449.

@mbauman
Copy link
Sponsor Member Author

mbauman commented May 7, 2016

Yup, it's a really wonderful catch and a simple fix — we don't want to let the reshaped view escape! I've cleaned things up so tests should pass at every stage along the way (despite how GitHub wants to display the order of the commits).

Thanks @jrevels. In the meantime, I did a quick run of test/perf/array.jl… and it's looking quite good. The huge wins are linear indexing into linear slow arrays, and there's a big win for logical indexing into BitArray, too. No tests are more than 20% worse, and there's not an obvious pattern to the 17 that are more than 10% worse.

min_relative

@mbauman
Copy link
Sponsor Member Author

mbauman commented May 8, 2016

Any thoughts on the spelling of the reshape(A, Val{N}) API? No matter what, we need to pass the number of dimensions as some sort of type parameter… but should we use Val for this API? We already have a whole slew of types that have dimension parameters — could we use the N in some array type?

  • convert(AbstractArray{T,N}, …) is already pretty overloaded, and it has the downside that you'd need to use a special typealias to just express AbstractDimsFirstArray{N} in any case.
  • Numpy uses the verb ravel for this operation (well, the flattening to 1-dimension part). We could define typealias Ravel{N,T} ReshapedArray{T,N}, and then either use convert(Ravel{N}, A) or Ravel{N}(A). It is a little strange, though, since it's not guaranteed to return a ReshapedArray.
  • Or maybe we just define and export an empty immutable Ravel{N} end type just for this task. But again it's odd to use construction or conversion here since you're doing neither.
  • Or just stick with reshape(A, Val{N}). It's simple, it works, and it has no extra exports.

@timholy
Copy link
Sponsor Member

timholy commented May 8, 2016

I don't like the convert option because it's not entirely clear that this is non-copying. Either the Val{N} or the Ravel{N} seem the best choices to me, and I don't have strong feelings about which is better.

@timholy
Copy link
Sponsor Member

timholy commented May 8, 2016

Another option is reshape(A, ntuple(d->Colon(), Val{N})), although that's a lot to force users to type just to reshape to N dimensions.

@jrevels
Copy link
Member

jrevels commented May 9, 2016

The maiden voyage of the new Nanosoldier configuration:

@nanosoldier runbenchmarks(ALL, vs = ":master")

@mbauman mbauman mentioned this pull request May 9, 2016
@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

@jrevels
Copy link
Member

jrevels commented May 9, 2016

Seems like I need to raise tolerances for the Big* scalar benchmarks, but those indexing improvements are pretty awesome!

EDIT: Seems like the status icon didn't update even though the job is finished - I'll have to look into that...

@timholy
Copy link
Sponsor Member

timholy commented May 9, 2016

@jrevels, that also seems a lot faster than I remember it. If so, nice!

@jrevels
Copy link
Member

jrevels commented May 9, 2016

@jrevels, that also seems a lot faster than I remember it. If so, nice!

Thanks! The new execution strategy we're using has cut down the runtime for a single pass of the full suite from ~4 hours to around ~30 minutes; expected turnaround time for full PR builds like this one is down from ~11 hours to ~3 hours.

@mbauman
Copy link
Sponsor Member Author

mbauman commented May 9, 2016

Thanks so much @jrevels, this is great.

I'm okay with the current reshape(A, Val{N}) API — it's un-opinionated and is easy to change in the future. This is ready to go as far as I'm concerned. Any other comments?

else
:(@_propagate_inbounds_meta; (merge_indexes(V, idxs, subidxs),))
:(error("mismatched index dimensionalities"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any way of making this more descriptive if it ever happens?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be better to call this "rank" instead of "dimensionality"?

Copy link
Sponsor Member Author

@mbauman mbauman May 9, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I should probably enforce this in the constructor. The whole point here is to ensure that the number of indices stored in the SubArray matches the number of dimensions in the parent array. Even if you do manage to create a corrupted SubArray, though, you'd almost certainly hit a missing method in the other reshape reindex signatures long before you ever get here.

We've typically stayed away from using the word rank to mean ndims because of rank.

@tkelman tkelman merged commit 2c16e98 into master May 12, 2016
@tkelman tkelman deleted the mb/varargs branch May 12, 2016 16:16
@vtjnash
Copy link
Sponsor Member

vtjnash commented May 12, 2016

I seem to be consistently getting a failure in the core.jl test after this merged. It sometimes goes away with inlining, and can be prevented with --inline=no but the code is all crazy & missing a bounds check:

julia> [][] # should throw a bounds error
ERROR: UndefRefError: access to undefined reference
 in getindex(::Array{Any,1}) at abstractarray.jl:949
 in eval(::Module, ::Any) at ./boot.jl:226
julia> @code_typed [][]
1-element Array{Any,1}:
 LambdaInfo for getindex
:(begin  # abstractarray.jl, line 424:
        $(Expr(:meta, :propagate_inbounds)) # abstractarray.jl, line 425:
        $(Expr(:inbounds, false)) # abstractarray.jl, line 447: # abstractarray.jl, line 448: # abstractarray.jl, line 449:
        $(Expr(:boundscheck, true))
        true
        $(Expr(:boundscheck, :pop)) # abstractarray.jl, line 450:
        $(Expr(:inbounds, true))
        $(Expr(:inbounds, false))
        (Base.arraysize)(A,1)::Int64 # abstractarray.jl, line 949:
        $(Expr(:inbounds, :pop))
        r = (Base.arrayref)(A,1)
        $(Expr(:inbounds, :pop)) # abstractarray.jl, line 451:
        $(Expr(:inbounds, :pop))
        return r
    end)

@vtjnash
Copy link
Sponsor Member

vtjnash commented May 12, 2016

Ah, wait. Maybe it's not the direct fault of this PR. The zero-arg bounds check test is wrong:

base/abstractarray.jl
 138 _internal_checkbounds(A::AbstractArray) = true

@timholy
Copy link
Sponsor Member

timholy commented May 12, 2016

Where does that go wrong?

(I ask because I'm in the middle of revamping checkbounds yet again.)

@mbauman
Copy link
Sponsor Member Author

mbauman commented May 12, 2016

A must have at least one element for A[] to work.

This is unrelated, but It's unfortunately another case of a probably-should-be-ambiguous dispatch (#11242 (comment)):

julia> @which Base._getindex(Base.LinearFast(), A)
_getindex(::Base.LinearFast, A::AbstractArray, I::Real...) at abstractarray.jl:447
# I'd expect:
# _getindex{T,N}(::LinearFast, A::AbstractArray{T,N}, I::Vararg{Real,N}) at abstractarray.jl:438

@timholy
Copy link
Sponsor Member

timholy commented May 12, 2016

Oh, duh, right.

@vtjnash
Copy link
Sponsor Member

vtjnash commented May 13, 2016

the "obvious" fix seems to work for, is there any reason to do something more complicated:

diff --git a/base/abstractarray.jl b/base/abstractarray.jl
index 0972cae..6522bc3 100644
--- a/base/abstractarray.jl
+++ b/base/abstractarray.jl
@@ -135,7 +135,7 @@ throw_boundserror(A, I) = (@_noinline_meta; throw(BoundsError(A, I)))
 checkbounds(A::AbstractArray, I...) = (@_inline_meta; _internal_checkbounds(A, I...))
 # The internal function is named _internal_checkbounds since there had been a
 # _checkbounds previously that meant something different.
-_internal_checkbounds(A::AbstractArray) = true
+_internal_checkbounds(A::AbstractArray) = _internal_checkbounds(A,1)
 _internal_checkbounds(A::AbstractArray, I::AbstractArray{Bool}) = size(A) == size(I) || throw_boundserror(A, I)
 _internal_checkbounds(A::AbstractArray, I::AbstractVector{Bool}) = length(A) == length(I) || throw_boundserror(A, I)
 function _internal_checkbounds(A::AbstractArray, I1, I...)

@timholy
Copy link
Sponsor Member

timholy commented May 13, 2016

LGTM. Presumably could also just check !isempty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:arrays [a, r, r, a, y, s]
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

8 participants