Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up mapslices #40996

Merged
merged 18 commits into from
May 31, 2022
Merged

Speed up mapslices #40996

merged 18 commits into from
May 31, 2022

Conversation

mcabbott
Copy link
Contributor

@mcabbott mcabbott commented May 29, 2021

This does a little tidying up of mapslices, to work internally with tuples instead of mutating arrays of indices, and to avoid broadcasting in the inner loop where possible. There should be no functional changes. It's still not type stable, but it is a bit faster:

julia> @btime mapslices(cumsum, $(rand(10,100)), dims=1);
  26.250 μs (631 allocations: 39.47 KiB)  # master
  4.756 μs (116 allocations: 22.45 KiB))  # this PR

julia> @btime reduce(hcat, map(cumsum, eachcol($(rand(10,100)))));  # perhaps preferred?
  2.898 μs (102 allocations: 22.88 KiB)

julia> @btime cumsum($(rand(10,100)), dims=1); # for reference, without slices
  409.091 ns (1 allocation: 7.94 KiB)

julia> @btime mapslices(prod, $(rand(100,10)), dims=2);
  25.500 μs (835 allocations: 23.25 KiB)  # master
  1.433 μs (21 allocations: 1.50 KiB)     # this PR

julia> @btime map(prod, eachrow($(rand(100,10))));  # certainly better!
  526.393 ns (1 allocation: 896 bytes)

julia> @btime prod($(rand(100,10)), dims=2); # for reference, without slices
  292.286 ns (5 allocations: 976 bytes)

@dkarrasch dkarrasch added the performance Must go faster label May 29, 2021
base/abstractarray.jl Outdated Show resolved Hide resolved
@mcabbott
Copy link
Contributor Author

mcabbott commented Sep 1, 2021

Bump?

Comment on lines 2780 to 2868
if r1 isa AbstractArray && ndims(r1) > 0
n = sum(dim_mask)
if ndims(r1) > n && any(ntuple(d -> size(r1,d+n)>1, ndims(r1)-n))
s = size(r1)[1:n]
throw(DimensionMismatch("cannot assign slice f(x) of size $(size(r1)) into output of size $s"))
end
res1 = r1
else
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing this loop to res1 = if ... else ... end seems to make the function type-stable.

However, it also produces a strange inference complaint:

julia> @inferred mapslices(cumsum, [1 2; 3 4], dims=1)  # unrelated example, now works fine
2×2 Matrix{Int64}:
 1  2
 3  4

julia> mapslices(tuple, [1 2; 3 4], dims=1)  # desired result, but does not infer
1×2 Matrix{Tuple{Vector{Int64}}}:
 ([1, 3],)  ([2, 4],)

julia> @inferred mapslices(tuple, [1 2; 3 4], dims=1)
ERROR: return type Matrix{Tuple{Vector{Int64}}} does not match inferred return type Matrix
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] top-level scope
   @ REPL[121]:1

julia> mapslices(x -> tuple(x), [1 2; 3 4], dims=1)  # wtf?
ERROR: fatal error in type inference (type bound)
Stacktrace:
 [1] mapslices(f::var"#278#279", A::Matrix{Int64}; dims::Int64)
   @ Main ./REPL[107]:31
 [2] top-level scope
   @ REPL[122]:1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Same "fatal error in type inference" on master of 13 december, I rebased to re-run CI.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests which hit this are marked broken in 62068c2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Apparently imperfectly, though!)

@mcabbott
Copy link
Contributor Author

mcabbott commented Feb 9, 2022

Pre-1.8 bump.

This still hit #43064 when last I checked, but IIRC reverting 90cc027 would be enough to avoid that, at the cost of losing type stability.

Edit: still fails after #43923 .

base/abstractarray.jl Outdated Show resolved Hide resolved
test/arrayops.jl Outdated
Comment on lines 1202 to 1203
# issue #21123
@test mapslices(nnz, sparse(1.0I, 3, 3), dims=1) == [1 1 1]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests like this must move to https://github.com/JuliaSparse/SparseArrays.jl now, right? So I will remove them here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that I don't lose them, they are:

    # issue #21123
    @test mapslices(nnz, sparse(1.0I, 3, 3), dims=1) == [1 1 1]

    r = rand(Int8, 4,5,2)
    @test vec(mapslices(repr, r, dims=(2,1))) == map(repr, eachslice(r, dims=3))
    @test mapslices(prod, sparse(r[:,:,1]), dims=1) == prod(r[:,:,1], dims=1)
    @test mapslices(cumsum, sparse(r[:,:,1]), dims=1) == cumsum(r[:,:,1], dims=1)

@mcabbott
Copy link
Contributor Author

mcabbott commented Feb 9, 2022

BTW, the same benchmarks as the top message, on the same computer, are significantly slower on today's master. I hope this is tracked elsewhere? The relative change of the PR is still good:

julia> @btime mapslices(cumsum, $(rand(10,100)), dims=1);
  min 55.250 μs, mean 60.423 μs (630 allocations, 37.62 KiB)  # master
  min 8.111 μs, mean 9.557 μs (114 allocations, 22.45 KiB)  # this PR
  
julia> @btime reduce(hcat, map(cumsum, eachcol($(rand(10,100)))));  # perhaps preferred?
  min 4.875 μs, mean 6.207 μs (102 allocations, 22.88 KiB)

julia> @btime cumsum($(rand(10,100)), dims=1); # for reference, without slices
  min 699.885 ns, mean 1.865 μs (1 allocation, 7.94 KiB)

julia> @btime mapslices(prod, $(rand(100,10)), dims=2);
  min 57.458 μs, mean 61.089 μs (834 allocations, 22.91 KiB)  # master
  min 2.787 μs, mean 2.931 μs (18 allocations, 1.48 KiB)  # this PR

julia> @btime map(prod, eachrow($(rand(100,10))));  # certainly better!
  min 938.259 ns, mean 978.073 ns (1 allocation, 896 bytes)

julia> @btime prod($(rand(100,10)), dims=2); # for reference, without slices
  min 510.854 ns, mean 553.738 ns (5 allocations, 976 bytes)

@oscardssmith
Copy link
Member

We apparently should have this as one of the tests @nanosoldier runs.

@mcabbott
Copy link
Contributor Author

mcabbott commented May 7, 2022

Bump?

This is nearly a year old, and seems an uncontroversial improvement.

At present it hits #43064, although rarely. I can remove one commit (and perhaps leave a comment) to avoid that, at the cost of losing type-stability. Still a large speedup without.

@oscardssmith
Copy link
Member

I want this, can we just merge it?

@ararslan
Copy link
Member

ararslan commented May 9, 2022

Bumping the requested reviewers: @JeffBezanson, @mbauman

@mcabbott
Copy link
Contributor Author

Another bump? More than a year...

@oscardssmith oscardssmith merged commit 3eaed8b into JuliaLang:master May 31, 2022
@oscardssmith
Copy link
Member

happy late birthday #40996 🎂

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:arrays [a, r, r, a, y, s] performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants