Indexing performance #11595

mbauman · 2015-06-05T18:24:11Z

Somewhere within a rebase during the month of May, I lost some of the performance I had been expecting to see in #10525. I had been tracking the output of test/perf/array/perf.jl over the life of the issue, and periodically creating new branches locally as I rebased to keep up with master. So I was able to go back in time to a few snapshots and look at some key metrics from the perf test. Here's a quick dump of what I found. I won't be able to spend more time on this today, but I will try to resolve this over the weekend.

mb/abstractsmarts10 has the performance I was expecting:

julia,sumeltIs ArrayLS,6.971857,12.938212,8.154441,1.144733
julia,sumeachIs ArrayLS,3.964500,8.134745,4.595888,0.732331
julia,sumlinearIs ArrayLS,28.147964,37.045052,29.551360,2.039010

mb/abstractsmarts10-updated is a straightforward rebase of that branch to master-just-before-10525-was-merged. It's a little different from master (it won't pass the tests, for one), but the performance is very similar. I wanted to eliminate any possible difference.

julia,sumeltIs ArrayLS,30.437162,40.237688,32.703086,2.072026
julia,sumeachIs ArrayLS,25.994786,43.018392,29.987360,3.992816
julia,sumlinearIs ArrayLS,3.178545,7.702495,4.153768,0.829394 # This is wild

Sorry for the false alarm @timholy, the commit I flagged in #5117 was totally unrelated and simply due to some dirty files I had. It's something else.

The text was updated successfully, but these errors were encountered:

mbauman · 2015-06-05T18:25:41Z

Anyone know how to do an automated git bisect of a rebase? Perhaps using git rerere to automate the conflict resolution?

pao · 2015-06-05T18:41:43Z

Create a test branch with your commit reordered early into the history, back when you originally proposed it? Effectively this would rebase everything since on top of #10525, then you can bisect that branch.

mbauman · 2015-06-06T02:33:38Z

Ah, yes, that was the trick. Thanks for the idea, @pao. This answer makes a lot more sense: 677f537 is the first bad commit

switch inlining costs to use scaled integer math. this removes the need for floating point numbers in inference.

I'll take a look at it more in depth tomorrow.

vtjnash · 2015-06-06T03:03:38Z

unless i made a math error, that commit is strictly switching from floating point math to scaled integer math (with a scale factor of 1000), and should not have significantly changed the thresholds.

yuyichao · 2015-06-06T03:12:58Z

@vtjnash 677f537#diff-c0d983631c937ee5acfd0137cd7332deR2554 maybe?

yuyichao · 2015-06-06T03:14:09Z

Ahh. Github doesn't automatically expand the diff....

Let's try again.

julia/base/inference.jl

Line 2573 in 6b94780

 if ((occ==0 && is(aeitype,Bottom)) || islocal || (occ > 1 && !inline_worthy(aei, occ*2)) || 

@vtjnash

vtjnash · 2015-06-06T03:25:29Z

@yuyichao i'm not sure what you are pointing at on that line, since it doesn't deal with inlining costs nor did it change in the bisect commit mentioned above

yuyichao · 2015-06-06T03:27:53Z

That is the only call to inline_worthy that you haven't touched. The parameter passed in is not scaled by 1000. In case something goes wrong, here's the line

 if ((occ==0 && is(aeitype,Bottom)) || islocal || (occ > 1 && !inline_worthy(aei, occ*2)) ||

Edit: In another word

diff --git a/base/inference.jl b/base/inference.jl
index f03d3aa..d6f3a81 100644
--- a/base/inference.jl
+++ b/base/inference.jl
@@ -2570,7 +2570,7 @@ function inlineable(f::ANY, e::Expr, atype::ANY, sv::StaticVarInfo, enclosing_as
             end
         end
         free = effect_free(aei,sv,true)
-        if ((occ==0 && is(aeitype,Bottom)) || islocal || (occ > 1 && !inline_worthy(aei, occ*2)) ||
+        if ((occ==0 && is(aeitype,Bottom)) || islocal || (occ > 1 && !inline_worthy(aei, occ*2000)) ||
                 (affect_free && !free) || (!affect_free && !effect_free(aei,sv,false)))
             if occ != 0 # islocal=true is implied by occ!=0
                 if !islocal

vtjnash · 2015-06-06T03:37:46Z

oops, good catch -- yes, i didn't even see that one

mbauman · 2015-06-06T04:23:39Z

Ah, wonderful. Thanks guys. That helps a lot. It's still not 100%, but definitely getting closer. On abstractsmarts10-updated + the above patch:

julia,sumeltIs ArrayLS,12.441170,18.845323,14.019387,1.576552
julia,sumeachIs ArrayLS,9.144515,15.143982,10.485381,1.488682
julia,sumlinearIs ArrayLS,28.358809,37.302391,33.664462,2.070900

Inlining is crazy difficult to get right.

timholy · 2015-06-06T09:48:47Z

The memory consumption reported in #5117 (comment) is also consistent with a splatting penalty.

yuyichao · 2015-06-06T14:51:35Z

One way to avoid the splatting penalty after #10380 is to use tuple as argument directly instead of using it as vararg argument. (See script and output below).

I'm not sure if I would recommand this as a long term solution though because IMHO the vararg case will be as fast (#11248) or even better (#5402) than the tuple case in the future.

#!/usr/bin/julia -f

using Base: unsafe_getindex, checkbounds

macro time_func(f, args...)
    args = eval(current_module(), Expr(:tuple, args...))::Tuple
    argnames = Symbol[gensym() for i in 1:length(args)]
    types = map(typeof, args)
    quote
        function wrapper($(argnames...))
            $(Expr(:meta, :noinline))
            $f($(argnames...))
        end
        function timing_wrapper()
            println($f, $types)
            wrapper($(args...))
            gc()
            @time for i in 1:10000000
                wrapper($(args...))
            end
            # Profile.print()
            gc()
        end
        timing_wrapper()
    end
end

function checkbounds2{T<:Union(Real,AbstractArray,Colon)}(A::AbstractArray,
                                                          I::Tuple{Vararg{T}})
    Base.@_inline_meta
    n = length(I)
    if n > 0
        for dim = 1:(n-1)
            (Base._checkbounds(size(A,dim), I[dim]) ||
             Base.throw_boundserror(A, I))
        end
        (Base._checkbounds(Base.trailingsize(A,n), I[n]) ||
         Base.throw_boundserror(A, I))
    end
end

@inline function getindex2(V::SubArray, I::Int...)
    checkbounds2(V, I)
    unsafe_getindex(V, I...)
end

function checkbounds3{T<:Union(Real,AbstractArray,Colon)}(A::AbstractArray,
                                                          I::T...)
    Base.@_inline_meta
    n = length(I)
    if n > 0
        for dim = 1:(n-1)
            (Base._checkbounds(size(A,dim), I[dim]) ||
             Base.throw_boundserror(A, I))
        end
        (Base._checkbounds(Base.trailingsize(A,n), I[n]) ||
         Base.throw_boundserror(A, I))
    end
end

@inline function getindex3(V::SubArray, I::Int...)
    checkbounds3(V, I...)
    unsafe_getindex(V, I...)
end

s1 = sub(Float64[1, 2], 1:2)

# @code_llvm getindex2(s1, 1)
# @code_llvm getindex(s1, 1)

@time_func(getindex3, s1, 1)
@time_func(getindex2, s1, 1)
@time_func(getindex, s1, 1)

getindex3(SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},1},Int64)
   1.560 seconds      (10000 k allocations: 153 MB, 1.13% gc time)
getindex2(SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},1},Int64)
  88.030 milliseconds
getindex(SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},1},Int64)
   1.473 seconds      (10000 k allocations: 153 MB, 0.22% gc time)

timholy · 2015-06-06T15:33:07Z

I've dug into this more, applying @yuyichao's patch and also editing checkbounds to always return nothing.

julia> s1 = sub(Float64[1, 2], 1:2)
2-element SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},1}:
 1.0
 2.0

julia> f(s) = s[1]
f (generic function with 1 method)

julia> @code_warntype f(s1)
Variables:
  s::SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},1}

Body:
  begin  # none, line 1:
      checkbounds(s::SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},1},1)::Void
      return (top(arrayref))((top(getfield))(s::SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},1},:parent)::Array{Float64,1},(top(box))(Int64,(top(sub_int))((top(box))(Int64,(top(add_int))((top(getfield))(s::SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},1},:first_index)::Int64,1)),1)))::Float64
  end::Float64

No more type-instability, of course. But notice the call to checkbounds is not inlined.

Using HackThatBase, I discovered the reason: something, somewhere, is inserting a NewvarNode before the meta expression (output of linfo and atypes from within typeinf, indentation of the variable indicates backtrace depth):

                                                      linfo = AST(:($(Expr(:lambda, Any[:A,:(I::Union(Real,AbstractArray,Colon)...)], Any[Any[:n,symbol("#s17"),symbol("#s19"),:dim,symbol("#s18")],Any[Any[:A,:Any,0],Any[:I,:Any,0],Any[:n,:Any,18],Any[symbol("#s17"),:Any,18],Any[symbol("#s19"),:Any,2],Any[:dim,:Any,18],Any[symbol("#s18"),:Any,18]],Any[],2], :(begin  # abstractarray.jl, line 162:
        NewvarNode(symbol("#s17"))
        $(Expr(:meta, :inline)) # line 163:
        n = length(I) # line 164:
        unless n > 0 goto 7 # line 165:
        GenSym(0) = colon(1,n - 1)
        #s19 = (top(start))(GenSym(0))
        unless (top(!))((top(done))(GenSym(0),#s19)) goto 1
        2: 
        GenSym(1) = (top(next))(GenSym(0),#s19)
        dim = (top(getfield))(GenSym(1),1)
        #s19 = (top(getfield))(GenSym(1),2) # line 166:
        #s18 = _checkbounds(size(A,dim),getindex(I,dim))
        unless #s18 goto 4
        goto 5
        4: 
        throw_boundserror(A,I)
        5: 
        3: 
        unless (top(!))((top(!))((top(done))(GenSym(0),#s19))) goto 2
        1: 
        0:  # line 168:
        #s17 = _checkbounds(trailingsize(A,n),getindex(I,n))
        unless #s17 goto 6
        goto 7
        6: 
        throw_boundserror(A,I)
        7:  # line 170:
        return nothing
    end)))))
                                                      atypes = Tuple{MyArray{Float64,1},Int64}
                                                      (in typeinf at none, line 1294)

Any ideas where this is happening, @vtjnash?

yuyichao · 2015-06-06T15:37:24Z

Sounds like the same issue here #10980 (comment) - - ....

yuyichao · 2015-06-06T15:38:00Z

I've also tried to manually inline the call and it appears to be slower than the tuple argument version here.

yuyichao · 2015-06-06T15:40:20Z

With the following method added. full script

@inline function getindex4(V::SubArray, I::Int...)
    n = length(I)
    if n > 0
        for dim = 1:(n - 1)
            (Base._checkbounds(size(V, dim), I[dim]) ||
             Base.throw_boundserror(V, I))
        end
        (Base._checkbounds(Base.trailingsize(V, n), I[n]) ||
         Base.throw_boundserror(V, I))
    end
    unsafe_getindex(V, I...)
end

Output:

getindex4(SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},1},Int64)
   2.030 seconds      (30000 k allocations: 458 MB, 1.31% gc time)
getindex3(SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},1},Int64)
   1.414 seconds      (10000 k allocations: 153 MB, 0.22% gc time)
getindex2(SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},1},Int64)
  80.872 milliseconds
getindex(SubArray{Float64,1,Array{Float64,1},Tuple{UnitRange{Int64}},1},Int64)
   1.116 seconds      (10000 k allocations: 153 MB, 0.28% gc time)

timholy · 2015-06-06T15:44:38Z

Once we solve the inlining problem, all will be golden. It worked fine just a short while ago.

mbauman · 2015-06-06T15:47:00Z

something, somewhere, is inserting a NewvarNode before the meta expression

Ah, very nice. I'll run another bisect for that.

timholy · 2015-06-06T15:57:59Z

Suspected ebfebfd, CC @JeffBezanson

fixes #11065

mbauman · 2015-06-06T16:25:00Z

Yup, confirmed. (Bisects are so fast when there are only two commits to test!)

mbauman · 2015-06-06T18:31:26Z

Well, I tried playing around in the frontend to ensure meta always stays on top, but I am not a lisper. It'll have to wait for someone with better lisp chops than me.

JeffBezanson · 2015-06-08T02:18:58Z

I think we'll have to be more robust about looking for metadata. Otherwise multiple features will compete for that top spot in the function body.

This fixes the primary problem in #11595, but is not enough on its own to recapture full performance. Interestingly, this does not appear to cause a measurable slowing of the compilation of base.

This address most of the remaining problem in #11595, but it's still about three-fold short of the performance we had before getindex(::SubArray, ...) checked bounds.

ref commit 677f537 and #11595 (comment)

mbauman · 2015-06-08T19:34:43Z

Thank you everyone for helping find the issues here. While we still need to find better performance for SubArrays, I think this issue has served its purpose. We're now back to the timings I was expecting to see for #10525. 👍

timholy · 2015-06-08T21:49:16Z

I have to say, it's still a little unsatisfying that a week ago I had

julia> include("/tmp/mykern.jl")
follow up: mykern( 2:(siz - 3), s, a, x, y )
  11.795 milliseconds (6 allocations: 192 bytes)
follow up: mykern( 2:(siz - 3), s1, a, x1, y1 )
 108.014 milliseconds (38263 allocations: 1119 KB)  # note: this includes compilation, ignore
follow up: mykern( 3:(siz - 2), s2, a, x2, y2 )
  16.299 milliseconds (6 allocations: 192 bytes)
  16.692 milliseconds (6 allocations: 192 bytes)
  22.398 milliseconds (6 allocations: 192 bytes)

(using the test in #5117 (comment)) and now I have

julia> include("/tmp/mykern.jl")
follow up: mykern( 2:(siz - 3), s, a, x, y )
  10.265 milliseconds (6 allocations: 192 bytes)
follow up: mykern( 2:(siz - 3), s1, a, x1, y1 )
 103.093 milliseconds (31840 allocations: 1266 KB)
follow up: mykern( 3:(siz - 2), s2, a, x2, y2 )
  66.623 milliseconds (6 allocations: 192 bytes)
  67.183 milliseconds (6 allocations: 192 bytes)
  69.709 milliseconds (6 allocations: 192 bytes)

A 6-fold gap is big enough that I'd be reluctant to use SubArrays in real applications (and I have a lot of code that uses them now)---I'd prefer to just copy the data.

I think we should turn bounds-checking off on SubArrays until we can recover something close to the original performance.

yuyichao · 2015-06-08T22:05:03Z

(I think I've lost track of some discussion half way through so I'm sorry if this has been answered before)

What has changed between the good (fast) commit and the bad (slow) commit in your (@timholy) comment above. Does #11617 and 56342be not fixing the problems introduced by ebfebfd and 677f537 respectively? Or is there another bad commit to be bisected or does the performance just gradually decreases over the past week(s)?

timholy · 2015-06-08T22:09:50Z

It's the fact that #10525 turned on bounds-checking for SubArrays, which we've never had before. Bounds-checking is a Good Thing, but the performance hit is not. For some reason our bounds-checking on Array is not causing anything like the same performance hit.

yuyichao · 2015-06-08T22:11:52Z

Ahh. I see. Make sense. Thanks.

mbauman · 2015-06-08T22:19:40Z

I think we should turn bounds-checking off on SubArrays until we can recover something close to the original performance.

I definitely agree. I'll work on improving the situation this week, but my time is somewhat limited.

timholy · 2015-06-08T22:29:46Z

No worries, it could be a hard problem, I frankly expect we'll have to leave it off for 0.4.

This fixes the primary problem in JuliaLang#11595, but is not enough on its own to recapture full performance. Interestingly, this does not appear to cause a measurable slowing of the compilation of base.

This address most of the remaining problem in JuliaLang#11595, but it's still about three-fold short of the performance we had before getindex(::SubArray, ...) checked bounds.

ref commit 677f537 and JuliaLang#11595 (comment)

mbauman · 2015-06-08T23:43:20Z

On Julia just-before-10525, I'm seeing ~15ms for the last three tests. If I remove the @inbounds annotations, I get ~60-65ms.

On master, I get ~55ms, and @inbounds isn't actually disabling any bounds checks! The place where we need to improve is if I rewrite the algorithm to use unsafe_(get|set)index, it still lags behind quite a bit at ~40ms.

mbauman · 2015-06-09T00:06:53Z

Hm, I don't understand why unsafe_get/setindex isn't giving us the performance we want here. If I simply remove checkbounds from SubArray's getindex, this gives the pre-10525 performance.

Well, at a minimum this is a very easy change to restore the old performance.

mbauman · 2015-06-09T02:04:55Z

It's pretty ugly to rewrite mykern in terms of unsafe_* instead of @inbounds, but it is possible to restore the original performance here with #11625.

mbauman mentioned this issue Jun 5, 2015

room for performance improvement for SubArray #5117

Closed

mbauman mentioned this issue Jun 6, 2015

RFC: Give AbstractArrays smart and performant indexing behaviors for free #10525

Merged

timholy referenced this issue Jun 6, 2015

always re-initialize variables to undefined at the top of their scope

ebfebfd

fixes #11065

timholy mentioned this issue Jun 6, 2015

Type stable block #10980

Open

JeffBezanson added performance Must go faster regression Regression in behavior compared to a previous version labels Jun 8, 2015

timholy added a commit that referenced this issue Jun 8, 2015

Improve performance of bounds checking

6825f4d

This address most of the remaining problem in #11595, but it's still about three-fold short of the performance we had before getindex(::SubArray, ...) checked bounds.

timholy mentioned this issue Jun 8, 2015

Check for :meta expressions more thoroughly, and improve indexing performance #11617

Merged

mbauman pushed a commit that referenced this issue Jun 8, 2015

Add a missing factor of 1000 for inline_worthy

56342be

ref commit 677f537 and #11595 (comment)

mbauman closed this as completed Jun 8, 2015

tkelman pushed a commit to tkelman/julia that referenced this issue Jun 8, 2015

Add a missing factor of 1000 for inline_worthy

683d5c0

ref commit 677f537 and JuliaLang#11595 (comment)

mbauman mentioned this issue Jun 20, 2015

For Loop Slowdown #11787

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Indexing performance #11595

Indexing performance #11595

mbauman commented Jun 5, 2015

mbauman commented Jun 5, 2015

pao commented Jun 5, 2015

mbauman commented Jun 6, 2015

vtjnash commented Jun 6, 2015

yuyichao commented Jun 6, 2015

yuyichao commented Jun 6, 2015

vtjnash commented Jun 6, 2015

yuyichao commented Jun 6, 2015

vtjnash commented Jun 6, 2015

mbauman commented Jun 6, 2015

timholy commented Jun 6, 2015

yuyichao commented Jun 6, 2015

timholy commented Jun 6, 2015

yuyichao commented Jun 6, 2015

yuyichao commented Jun 6, 2015

yuyichao commented Jun 6, 2015

timholy commented Jun 6, 2015

mbauman commented Jun 6, 2015

timholy commented Jun 6, 2015

mbauman commented Jun 6, 2015

mbauman commented Jun 6, 2015

JeffBezanson commented Jun 8, 2015

mbauman commented Jun 8, 2015

timholy commented Jun 8, 2015

yuyichao commented Jun 8, 2015

timholy commented Jun 8, 2015

yuyichao commented Jun 8, 2015

mbauman commented Jun 8, 2015

timholy commented Jun 8, 2015

mbauman commented Jun 8, 2015

mbauman commented Jun 9, 2015

mbauman commented Jun 9, 2015

Indexing performance #11595

Indexing performance #11595

Comments

mbauman commented Jun 5, 2015

mbauman commented Jun 5, 2015

pao commented Jun 5, 2015

mbauman commented Jun 6, 2015

vtjnash commented Jun 6, 2015

yuyichao commented Jun 6, 2015

yuyichao commented Jun 6, 2015

vtjnash commented Jun 6, 2015

yuyichao commented Jun 6, 2015

vtjnash commented Jun 6, 2015

mbauman commented Jun 6, 2015

timholy commented Jun 6, 2015

yuyichao commented Jun 6, 2015

timholy commented Jun 6, 2015

yuyichao commented Jun 6, 2015

yuyichao commented Jun 6, 2015

yuyichao commented Jun 6, 2015

timholy commented Jun 6, 2015

mbauman commented Jun 6, 2015

timholy commented Jun 6, 2015

mbauman commented Jun 6, 2015

mbauman commented Jun 6, 2015

JeffBezanson commented Jun 8, 2015

mbauman commented Jun 8, 2015

timholy commented Jun 8, 2015

yuyichao commented Jun 8, 2015

timholy commented Jun 8, 2015

yuyichao commented Jun 8, 2015

mbauman commented Jun 8, 2015

timholy commented Jun 8, 2015

mbauman commented Jun 8, 2015

mbauman commented Jun 9, 2015

mbauman commented Jun 9, 2015