Re-working SubArray #3496

lindahua · 2013-06-22T13:42:58Z

I created this issue to move the discussion in #3224 over here.

I wrote a benchmark script on Gist to compare the performances of different approaches:

Here is what I get:

Benchmark with long columns
with_ref!:  slow-down = 11.1970x     # use a[:,i]
with_sub!:  slow-down = 114.4806x    # use sub(a, 1:m, i)
with_sub1!: slow-down = 1.1974x      # use pointer_to_array
with_sub2!: slow-down = 2.5286x      # use immutable sub-vector

Benchmark with short columns
with_ref!:  slow-down = 261.4301x
with_sub!:  slow-down = 1256.6697x
with_sub1!: slow-down = 38.8170x
with_sub2!: slow-down = 19.0293x

We can observe:

The current version of sub, which creates SubArray is way too slow -- the overhead and indexing cost just dominate.
Using a[:, i] which makes copies also kills performance
Both immutable sub-vector and using pointer_to_array to create a shared-memory view work much better. That said, both lead to significant overhead when the columns are short.
Overall, the performance of creating a shared-memory array using pointer is better -- after all, you simply get the performance of builtin array. The limitation is that this applies only to contiguous blocks.

I may work on this as the first step towards better performance of Julia. But we need some discussion to settle on a approach first.

A problem with immutable is that methods defined on them do not get properly inlined, resulting in performance hit.

The text was updated successfully, but these errors were encountered:

lindahua · 2013-06-22T14:50:51Z

Here is a tentative proposal,

import Base.getindex

abstract AbstractSubArray{T,N,A<:AbstractArray} <: AbstractArray{T,N}

# for contiguous block

immutable ContiguousSubArray{T,N,A<:Array} <: AbstractSubArray{T,N,A}
    parent::A
    dims::NTuple{N,Int}
    offset::Int
end

getindex(a::ContiguousSubArray, i::Int) = a.parent[a.offset + i]
getindex(a::ContiguousSubArray, i::Int, j::Int) = a.parent[a.offset + i + (j - 1) * a.dims[1]]

# for strided block

immutable StridedSubVector{T,A<:Array} <: AbstractSubArray{T,1,A}
    parent::A
    dim1::Int
    offset1::Int
    stride1::Int
end

getindex(a::StridedSubVector, i::Int) = a.parent[a.offset1 + (i - 1) * a.stride1]

immutable StridedSubMatrix{T,A<:Array} <: AbstractSubArray{T,2,A}
    parent::A
    dim1::Int
    dim2::Int
    offset1::Int
    offset2::Int
    stride1::Int
    stride2::Int
end

immutable MultiDimStridedSubArray{T,N,A<:Array} <: AbstractSubArray{T,N,A}
    parent::A
    dims::NTuple{N,Int}
    offsets::NTuple{N,Int}
    strides::NTuple{N,Int}
end

# more generic case where the parent is not an instance of Array

immutable GenericSubArray{T,N,A<:AbstractArray,I<:(RangeIndex...,)} <: AbstractSubArray{T,N,A}
    parent::A
    shape::NTuple{N,Int}
    dims::I    
    strides::NTuple{N,Int}
end

# this allows to define functions that work consistently over arrays with contiguous memory
typealias ContiguousArray{T,N} Union(Array{T,N}, ContiguousSubArray{T})
typealias StridedArray{T,N} Union(Array{T,N}, SubArray{T,N})


# specialized methods to create sub-array
# we need a bunch of these & probably bound checking

function sub{T}(a::Array{T}, ::Colon, j::Int)
    m = size(a, 1)
    ContiguousSubArray{T,1,Array{T}}(a, (m,), m * (j-1))
end

function sub{T}(a::Array{T}, rgn::Range1{Int}, j::Int)
    m = size(a, 1)
    ContiguousSubArray{T,1,Array{T}}(a, (length(rgn),), (rgn[1]-1) + m * (j-1)) 
end

function sub{T}(a::Array{T}, rgn::Range{Int}, j::Int)
    m = size(a, 1)
    StridedSubVector{T,Array{T}}(a, length(rgn), (rgn[1]-1) + m * (j-1), step(rgn))
end

Of course, this is just a small part. But, hopefully, you get the basic idea. If the compiler can get those getindex methods inlined, this can be just as efficient as handcrafted loops. Otherwise, I expect there is about 2x - 3x slow-down, which is still much better than the current SubArray.

If we agree on this general direction, I may then make a pull request and work on it soon.

cc: @ViralBShah @StefanKarpinski @JeffBezanson @timholy

timholy · 2013-06-22T16:40:19Z

I haven't looked this over in detail except to note that this version is at least memory-safe, even if it forces construction. In the long run this may be resolvable, but what I like about this is that it gives us a path forward that won't have to be rewritten in the long term.

lindahua · 2013-06-22T17:10:45Z

An improved version that achieves both good indexing performance & safety:

immutable SubVec{T, A<:Array} <: AbstractArray{T, 1}
    a::A
    ptr::Ptr{T}
    len::Int

    SubVec(a::A, offset::Int, len::Int) = new(a, pointer(a, offset+1), len)
end

getindex(s::SubVec, i::Int) = unsafe_load(s.ptr, i)  
setindex!(s::SubVec, v, i::Int) = unsafe_store!(s.ptr, v, i)
length(s::SubVec) = s.len

function sub{T}(a::Array{T}, ::Colon, j::Int)
    m = size(a, 1)
    SubVec{T,Array{T}}(a, (j - 1) * m, m)
end

Benchmark shows that this works as fast as hand-crafted loops. This version incorporates both the parent and an offseted pointer in the immutable:

Incorporating the parent makes memory-safe
Using the pointer in indexing gets getindex and setindex! inlined

timholy · 2013-06-22T17:26:31Z

Very cool! I do think it's probably worth having a dedicated 1d type, since getting good performance is so much easier and it's a very common case.

lindahua · 2013-06-22T17:31:20Z

Not only 1D vectors, any sub-array that is contiguous can in principle get such performance.

However, I just find another issue -- any attempt to do bound checking of getindex and setindex! just kills performance, leading to about 4x - 5x slow down.

timholy · 2013-06-22T18:16:43Z

For regular arrays, Jeff does this in the compiler (arrayref, see the check in cgutils.cpp:370). Presumably we'd want this, too.

timholy · 2013-06-22T18:26:03Z

Presumably, it's an inlining issue? (I usually check this by the profiler---you can tell by the nesting depth---but if you can read the disassembly it's more direct.)

ViralBShah · 2013-06-23T11:41:26Z

I like this approach, and it is much better thought out than what I was thinking of. Let's certainly replace the sub-array implementation with a better one.

On getindex(Array, I, J), I am not sure how to proceed. I mainly worry about returning a view (in cases where the region is contiguous, or at least strided), vs. returning a new array in cases where I and J are arbitrary vectors. In order to keep things consistent, and the communication from the implementer to the user simple, it may be best to let getindex keep returning a copy. SubArray already returns views and we can make that higher performance with the ideas in here.

@lindahua Perhaps you should also try disabling the bounds check in the yet to be merged #3268

timholy · 2013-06-23T12:20:19Z

We can also implement a non-strided view. That would solve your consistency concern.

lindahua · 2013-06-23T12:38:43Z

Even when I and J are arbitrary things (e.g. integer vector), we can still make a view by maintaining references to I and J.

So we will have a variety of subarrays, contiguous, strided, and more generic one that simply maintain references to given indexers.

Let's first focus on rewriting sub function to make it always return a view of some kind and just keep the copying behavior of a[I, J] for now.

ViralBShah · 2013-06-23T17:11:19Z

The thing about non-strided views is that it will have extra indexing overhead, and will be slower. The default can always be views, and the user has to make copies explicitly in some cases.

I agree that we should make sub work fast to start with.

timholy · 2015-01-03T20:40:44Z

I suspect this can be closed now.

lindahua mentioned this issue Jun 22, 2013

RFC: add SubVector (fast 1d slices) #3224

Closed

lindahua mentioned this issue Jun 23, 2013

RFC: Improved implementation of SubArray #3503

Closed

simonster mentioned this issue Nov 19, 2013

Roadmap for 0.3 #4853

Closed

21 tasks

JeffBezanson mentioned this issue Dec 15, 2013

room for performance improvement for SubArray #5117

Closed

timholy closed this as completed Jan 3, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Re-working SubArray #3496

Re-working SubArray #3496

lindahua commented Jun 22, 2013

lindahua commented Jun 22, 2013

timholy commented Jun 22, 2013

lindahua commented Jun 22, 2013

timholy commented Jun 22, 2013

lindahua commented Jun 22, 2013

timholy commented Jun 22, 2013

timholy commented Jun 22, 2013

ViralBShah commented Jun 23, 2013

timholy commented Jun 23, 2013

lindahua commented Jun 23, 2013

ViralBShah commented Jun 23, 2013

timholy commented Jan 3, 2015

Re-working SubArray #3496

Re-working SubArray #3496

Comments

lindahua commented Jun 22, 2013

lindahua commented Jun 22, 2013

timholy commented Jun 22, 2013

lindahua commented Jun 22, 2013

timholy commented Jun 22, 2013

lindahua commented Jun 22, 2013

timholy commented Jun 22, 2013

timholy commented Jun 22, 2013

ViralBShah commented Jun 23, 2013

timholy commented Jun 23, 2013

lindahua commented Jun 23, 2013

ViralBShah commented Jun 23, 2013

timholy commented Jan 3, 2015