extrema() is missing [,dims] #15376

bjarthur · 2016-03-06T14:48:14Z

maximum, minimum, unique, sum, etc. all have the option to reduce across dimensions, but not extrema. the fastest i've come up with is:

function extrema(a::AbstractArray, dim)
    dim = tuple(dim...)
    mi = squeeze(minimum(a,dim), dim)
    ma = squeeze(maximum(a,dim), dim)
    reshape(collect(zip(mi,ma)), size(mi))
end

i feel there must be a way to combine the loop repetition inside the calls to maximum and minimum, but i'm not even a white belt in Functor Fu. this, for example, is much slower:

extrema(a,dim) = squeeze( mapslices(extrema,a,dim), dim)

anyone got any better ideas? or should i submit a PR? thanks.

The text was updated successfully, but these errors were encountered:

tkelman · 2016-03-07T11:56:34Z

Are you comparing performance on master or 0.4?

bjarthur · 2016-03-07T13:18:48Z

the differences between master and 0.4 are negligible:

julia> VERSION
v"0.4.3"

julia> foo=rand(100,101,102);

julia> function extrema1(a::AbstractArray, dim)
           dim = tuple(dim...)
           mi = squeeze(minimum(a,dim), dim)
           ma = squeeze(maximum(a,dim), dim)
           reshape(collect(zip(mi,ma)), size(mi))
       end
extrema1 (generic function with 1 method)

julia> @time extrema1(foo,1);
  0.530319 seconds (433.87 k allocations: 20.979 MB)

julia> @time extrema1(foo,1);
  0.003566 seconds (40 allocations: 323.516 KB)

julia> @time extrema1(foo,1);
  0.003971 seconds (40 allocations: 323.516 KB)

julia> @time extrema1(foo,1);
  0.004340 seconds (40 allocations: 323.516 KB)

julia> extrema2(a::AbstractArray,dim) = squeeze( mapslices(extrema,a,dim), dim)
extrema2 (generic function with 1 method)

julia> @time extrema2(foo,1);
  0.972634 seconds (1.10 M allocations: 53.656 MB, 1.17% gc time)

julia> @time extrema2(foo,1);
  0.097835 seconds (463.67 k allocations: 22.483 MB, 4.42% gc time)

julia> @time extrema2(foo,1);
  0.088196 seconds (463.67 k allocations: 22.483 MB, 3.66% gc time)

julia> @time extrema2(foo,1);
  0.094305 seconds (463.67 k allocations: 22.483 MB, 3.06% gc time)

julia> VERSION
v"0.5.0-dev+3038"

julia> foo=rand(100,101,102);

julia> function extrema1(a::AbstractArray, dim)
                  dim = tuple(dim...)
                  mi = squeeze(minimum(a,dim), dim)
                  ma = squeeze(maximum(a,dim), dim)
                  reshape(collect(zip(mi,ma)), size(mi))
              end
extrema1 (generic function with 1 method)

julia> @time extrema1(foo,1);
  0.732535 seconds (435.69 k allocations: 22.081 MB, 0.96% gc time)

julia> @time extrema1(foo,1);
  0.003764 seconds (34 allocations: 323.172 KB)

julia> @time extrema1(foo,1);
  0.005146 seconds (34 allocations: 323.172 KB)

julia> @time extrema1(foo,1);
  0.005158 seconds (34 allocations: 323.172 KB)

julia> extrema2(a::AbstractArray,dim) = squeeze( mapslices(extrema,a,dim), dim)
extrema2 (generic function with 1 method)

julia> @time extrema2(foo,1);
  1.082338 seconds (1.28 M allocations: 62.019 MB, 1.44% gc time)

julia> @time extrema2(foo,1);
  0.088977 seconds (494.56 k allocations: 23.426 MB, 4.14% gc time)

julia> @time extrema2(foo,1);
  0.080576 seconds (494.56 k allocations: 23.426 MB, 3.93% gc time)

julia> @time extrema2(foo,1);
  0.085413 seconds (494.56 k allocations: 23.426 MB, 3.21% gc time)

timholy · 2016-03-07T16:14:49Z

If you want to try implementing it, this blog post should get you well on your way. See also code in reducedims for an optimization when the first dimension is one that you're reducing along. (In that case, you don't have to store the result in the array, you can use a temporary variable.)

bjarthur · 2016-03-14T11:48:12Z

i've made four versions of extrema with a dims argument, and the functor version i first posted is faster than those using mapslices, cartesian, and CartesianRange. see this gist.

i'm guessing CartestianRange is slow because of #9080.

should i submit a PR for the functor version?

note that this helps to resolve #3893.

timholy · 2016-03-15T12:44:46Z

I'd focus on the CartesianRange version. The effect in #9080 is modest, but I bet you're seeing a big slowdown. In your gist, Istart is not a type-stable variable: it starts life as a Vector{Int} and then morphs into a CartesianIndex{N}. On its own that would kill performance, but here you can't just solve it by variable renaming because there's a second issue: N is not inferrable because Vectors do not encode their length in their type parameters. (If you haven't discovered it yet, @code_warntype is a lovely tool. Allocate appropriate A and B and then use @code_warntype extrema_cartesianrange!(B, A). Then look for red. Not all red is evil, but anything in the "variables" section marked in red indicates an almost-certain problem. See the docs for more detail.)

The easiest approach would be not to worry about repeating work on the i=1 components, so that Istart = first(RA) with RA = CartesianRange(sA). You could alternatively use Istart = CartesianIndex(ntuple(f, Val{N})) where N has been extracted as the dimensionality of A and B in the function declaration and f is a suitably-defined anonymous function. This will be type-stable on julia-0.5 (but not 0.4) thanks to Jeff's recent work.

bjarthur · 2016-03-17T22:03:26Z

thanks @timholy for the tip on type stability. since the cartesian version was already stable, it was easiest to just add an @inbounds to it. it is now as fast as the functor version, and uses half the memory. i will submit a PR shortly.

vtjnash added the good first issue Indicates a good issue for first-time contributors to Julia label Mar 8, 2016

bjarthur closed this as completed Mar 17, 2016

bjarthur mentioned this issue Mar 18, 2016

added extrema(A,dims) #15550

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

extrema() is missing [,dims] #15376

extrema() is missing [,dims] #15376

bjarthur commented Mar 6, 2016

tkelman commented Mar 7, 2016

bjarthur commented Mar 7, 2016

timholy commented Mar 7, 2016

bjarthur commented Mar 14, 2016

timholy commented Mar 15, 2016

bjarthur commented Mar 17, 2016

extrema() is missing [,dims] #15376

extrema() is missing [,dims] #15376

Comments

bjarthur commented Mar 6, 2016

tkelman commented Mar 7, 2016

bjarthur commented Mar 7, 2016

timholy commented Mar 7, 2016

bjarthur commented Mar 14, 2016

timholy commented Mar 15, 2016

bjarthur commented Mar 17, 2016