Updates to findmin/findmax: function application over multidims #45061

andrewjradcliffe · 2022-04-22T21:55:09Z

This adds support for findmin (and findmax) syntax: findmin(f, A;
dims). It is a natural extension of the extant findmin versions,
and required only small changes to the core algorithm (findminmax!).
However, it was necessary to redirect the 2-arg version in
reduce.jl in order to provide a single call site for
_findmin(f, A, :).

This adds support for findmin (and findmax) syntax: findmin(f, A; dims). It is a natural extension of the extant findmin versions, and required only small changes to the core algorithm (findminmax!). However, it was necessary to redirect the 2-arg version in reduce.jl in order to provide a single call site for _findmin(f, A, :).

andrewjradcliffe · 2022-04-23T02:39:30Z

The 2 fails from SparseArrays/sparsevector. can be fixed by adding
@eval $fun(x::AbstractSparseVector{T}) where {T} = $fun(identity, x) to (at time of writing) line 1440 in SparseArrays/src/sparsevector.jl -- my thought is that is is a fair dispatch to add, given the analogous dispatches for AbstactSparseMatrixCSC. Testing on a local build demonstrated that this resolved the issue, but I do not know quite how to proceed (this has become a complicated first PR ^_^;;). I have submitted a PR for the proposed change in SparseArrays (JuliaSparse/SparseArrays.jl#94). Assuming that PR goes through, I think this PR will be non-failing from the Test perspective.

However, the errors in SparseArrays made me aware of the possible need to cover findmin(f, A::AbstractSparseMatrixCSC; dims), and findmin(A::AbstractSparseMatrixCSC; dims). Note that the current the dispatch in SparseArrays is not mangled by the above proposal. To my knowledge, the findmin(A::AbstractSparseMatrixCSC, region::Union{Integer,Tuple{Integer},NTuple{2,Integer}}) is not exactly publicly documented, so perhaps this is not so much of an issue for JuliaLang:master as is is for SparseArrays.

I would appreciate input on this -- inspection of the code in SparseArrays reveals that it would not be too difficult to handle. In fact, this PR's code works on the SparseArrays (but it's far less efficient, and the returned type is a SparseMatrix, which is part of the problem).
Using the code from this PR:

julia> y = sprand(Float64, 100,100, 0.3);

julia> findmax(y, (1,)) == findmax(identity, y, dims=(1,))
true

julia> findmax(y, (1,2)) == findmax(identity, y, dims=(1,2))
true

julia> findmax(y, (2,)) == findmax(identity, y, dims=(2,))
true

julia> y = sprand(Float64, 100,100, 0.001);

julia> findmax(y, 1) == findmax(identity, y, dims=1)
true

julia> findmax(y, (1,2)) == findmax(identity, y, dims=(1,2))
true

julia> findmax(y, (2,)) == findmax(identity, y, dims=(2,))
true

julia> @benchmark findmax(y, 2)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.758 μs …  1.034 ms  ┊ GC (min … max): 0.00% … 99.48%
 Time  (median):     2.896 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   3.360 μs ± 12.025 μs  ┊ GC (mean ± σ):  6.52% ±  2.20%

 Memory estimate: 3.41 KiB, allocs estimate: 41.

julia> @benchmark findmax(identity, y, dims=2)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  69.196 μs … 305.165 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     74.247 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   74.130 μs ±   3.145 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

 Memory estimate: 2.36 KiB, allocs estimate: 13.

julia> typeof(findmax(y, 2))
Tuple{Matrix{Float64}, Matrix{CartesianIndex{2}}}

julia> typeof(findmax(identity, y, dims=2))
Tuple{SparseMatrixCSC{Float64, Int64}, Matrix{CartesianIndex{2}}}

Thus, I believe that there's a need to handle this a bit more elegantly than just adding a dispatch such as _findmin(f, A::AbstractSparseMatrixCSC, dims) = findmin(A, dims), which ignores the function argument completely. From a quick glance, it should be possible to enable a function argument in the same manner as for non-sparse arrays. I will work on this while awaiting feedback.

andrewjradcliffe · 2022-04-24T23:47:32Z

Well, other than a test failure due to timeout on x86_x64-apple-darwin, the third revision looks like it does not have an outstanding problems.
A detailed description of thoughts on changes to the findmin/findmax interface and dispatches can be found at: JuliaSparse/SparseArrays.jl#94 (comment)

In hindsight, this was probably better posted as a comment on this issue, thus, I reproduce it here for posterity, with edits for relevance.

Current Base interface + dispatches for findmin/findmax:

findmax(A::AbstractArray; dims=:) = _findmax(A, dims)
findmax(f, domain) = mapfoldl( ((k, v),) -> (f(v), k), _rf_findmax, pairs(domain) )
findmax(itr) = _findmax(itr, :)
_findmax(a, ::Colon) = findmax(identity, a)
_findmax(A, region) # region being Union{Integer, NTuple{M, Integer}} where {M}

In the above scheme, the call to findmax(A) (with A::AbstractSparseVector) would have first been through findmax(A::AbstractArray; dims=:), which would redirect to _findmin(a, ::Colon), thereby calling findmax(identity, a), which then dispatches to findmax(f, A::AbstractSparseVector{T}) where {T}
While this works, it precludes the use of a function argument for multidimensional findmin/findmax. Hence, in the original commit, I was proposing the interface + dispatches described below:

Option 1 (not used) of new Base interface + dispatches for findmin/findmax:

findmax(A::AbstractArray; dims=:) = _findmax(identity, A, dims)
findmax(f, A::AbstractArray; dims=:) = _findmax(f, A, dims)
findmax(f, domain) = _findmax(f, domain, :)
findmax(itr) = findmax(identity, itr)
_findmax(f, domain, ::Colon) = mapfoldl( ((k, v),) -> (f(v), k), _rf_findmax, pairs(domain) )
_findmax(f, A, region) # region being Union{Integer, NTuple{M, Integer}} where {M}

Under those changes, the call to findmax(A) (with A::AbstractSparseVector) never has the chance to dispatch on findmax(f, A::AbstractSparseVector). I must admit, the whole situation is awkward. If we consider a hypothetical new user, adding a dispatch which provides an optimized method for findmax for some subtype of AbstractArray, they would probably think that adding findmax(f, A::SomeSpecialArray) would be sufficient. My change would force them to also add findmax(A::SomeSpecialArray) = findmax(identity, A).

Instead, it seems preferable to keep the original interface + dispatch intact, while incorporating the new ideas.

Option 2 (current proposal) of new Base interface + dispatches for findmin/findmax:

findmax(A::AbstractArray; dims=:) = _findmax(A, dims)
findmax(f, A::AbstractArray; dims=:) = _findmax(f, A, dims)
findmax(f, domain) = _findmax(f, domain, :)
findmax(itr) = _findmax(itr, :) # or, equivalently, findmax(identity, itr)
_findmax(a, ::Colon) = findmax(identity, a)
_findmax(A, dims) = _findmax(identity, A, dims)
_findmax(f, A, ::Colon) = mapfoldl( ((k, v),) -> (f(v), k), _rf_findmax, pairs(domain) )
_findmax(f, A, region) # region being Union{Integer, NTuple{M, Integer}} where {M}

This approach allows our hypothetical user to get the behavior they expect by defining just findmax(f, A::SomeSpecialArray).

Thoughts:

Option 1 will be more disturbing to users which are already relying on the extant behavior to define new dispatches.
Deducing that Option 1 necessitates findmax(A::SomeSpecialArray) = findmax(identity, A) is probably non-trivial for a newer user. For experienced users, it begs the question: why is this not handled through a generic?

Having worked through this with a clear head, I don't see much reason to consider anything other than Option 2. The PR has been amended to reflect this.

Edit: add some backticks.

timholy

This looks useful and seems to be a good implementation. It would need tests and probably NEWS.

timholy · 2022-04-27T12:36:08Z

base/reducedim.jl

@@ -200,6 +200,9 @@ function reducedim_init(f::ExtremaMap, op::typeof(_extrema_rf), A::AbstractArray
 return reducedim_initarray(A, region, v0, Tuple{Tmin,Tmax})
 end

+# Why is this defined for `max` but not also `min`?
+# It leads to improper behavior, e.g. maximum(abs2, Matrix{Int}(undef, 0, 1), dims=1)
+# Moreover, reducedim_init (above) handles the cases acceptably.


I am not sure. If you think you can delete this, go for it.

IIUC, this branch is equivalent with the general fallback except for it allows empty reduction.
We have

mapreduce_empty(f::typeof(abs), ::typeof(max), T) = abs(zero(T)) mapreduce_empty(f::typeof(abs2), ::typeof(max), T) = abs2(zero(T))

in Base. So I believe this is needed to keep consistence.

The odd thing is the return value.

julia> maximum(abs2, Matrix{Int}(undef, 0, 1), dims=1) 1×1 Matrix{Int64}: 0

What led me to this discovery was the potential need to match the behavior with findmax; currently I do not, as there is no sensible index which could be returned (while also returning a non-empty array).

Whether the lines can be removed, I am not sure. I suspect that a separate PR would be necessary to handle any problems the removal might cause (I'd rather not hold this PR up). Consequently, I am removing the comment form this PR.

andrewjradcliffe · 2022-04-27T19:50:42Z

This looks useful and seems to be a good implementation. It would need tests and probably NEWS.

Let me know whether the additions/changes in /test/reducedim.jl look OK. If needed, more can be added.

andrewjradcliffe · 2022-05-16T19:15:01Z

I have not abandoned this, but insofar as I can tell, the PR is OK. The failing test does not seem to be related to the proposed changes. Incidentally, feedback on the added tests would be helpful.

andrewjradcliffe added 2 commits April 22, 2022 14:44

Fixes the errors thrown by the tests in test/reducedim.jl

567cb77

Potentially less disturbing interface + dispatches

23dc6b6

andrewjradcliffe added 3 commits April 26, 2022 19:08

Note oddity -- see comment in file

a5287be

Update docstrings

d4c0c9a

More robust type inference for non-empty containers

87cbb4d

timholy added the needs tests Unit tests are required for this change label Apr 27, 2022

timholy reviewed Apr 27, 2022

View reviewed changes

andrewjradcliffe added 3 commits April 27, 2022 12:05

Test additions; some reorganization using @testset

2e1ae02

squash! Test additions; some reorganization using @testset

ad19d8c

Remove comment; to revisit separately

755b5fe

Add a very minor twist on NaN testing

e0679ac

andrewjradcliffe requested a review from timholy May 6, 2022 18:40

Add a not-so-challening output type inference test

69fb2a1

Add a few more tests of output type inference

fb02600

oscardssmith added the status:triage This should be discussed on a triage call label May 18, 2022

debug: retrigger CI

665b479

oscardssmith removed the status:triage This should be discussed on a triage call label May 26, 2022

oscardssmith merged commit e65a4af into JuliaLang:master May 26, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updates to findmin/findmax: function application over multidims #45061

Updates to findmin/findmax: function application over multidims #45061

andrewjradcliffe commented Apr 22, 2022

andrewjradcliffe commented Apr 23, 2022

andrewjradcliffe commented Apr 24, 2022 •

edited

Loading

timholy left a comment

timholy Apr 27, 2022

N5N3 Apr 27, 2022

andrewjradcliffe Apr 27, 2022

andrewjradcliffe commented Apr 27, 2022

andrewjradcliffe commented May 16, 2022

Updates to findmin/findmax: function application over multidims #45061

Updates to findmin/findmax: function application over multidims #45061

Conversation

andrewjradcliffe commented Apr 22, 2022

andrewjradcliffe commented Apr 23, 2022

andrewjradcliffe commented Apr 24, 2022 • edited Loading

timholy left a comment

Choose a reason for hiding this comment

timholy Apr 27, 2022

Choose a reason for hiding this comment

N5N3 Apr 27, 2022

Choose a reason for hiding this comment

andrewjradcliffe Apr 27, 2022

Choose a reason for hiding this comment

andrewjradcliffe commented Apr 27, 2022

andrewjradcliffe commented May 16, 2022

andrewjradcliffe commented Apr 24, 2022 •

edited

Loading