Make `findall` faster for AbstractArrays #37177

nalimilan · 2020-08-24T13:41:03Z

The findall fallback is quite slow when predicate is a small function compared with generating a logical index using broadcast and calling findall on it to compute integer indices. The gain is most visible when predicate is true for a large proportion of entries, but it's there even when all of them are false.
The drawback of this approach is that it requires allocating a vector of length(a)/8 bytes whatever the number of returned indices.

Some benchmarks using > and ==, for which the impact of using broacast should be the most visible thanks to SIMD. Note the timings difference when changing the proportion of true entries.

using BenchmarkTools

findall2(testf::Function, A::AbstractArray) = findall([testf(x)::Bool for x in A])
findall3(testf::Function, A::AbstractArray) = findall(testf.(A))

x = rand(10_000_000);

# Current state
julia> @btime findall(>(0.5), x);
  148.785 ms (24 allocations: 65.00 MiB)

julia> @btime findall(>(0.8), x);
  55.894 ms (22 allocations: 17.00 MiB)

julia> @btime findall(>(0.95), x);
  21.648 ms (20 allocations: 5.00 MiB)

julia> @btime findall(==(0.5), x);
  18.071 ms (1 allocation: 80 bytes)

# Using a comprehension
julia> @btime findall2(>(0.5), x);
  87.761 ms (5 allocations: 47.68 MiB)

julia> @btime findall2(>(0.8), x);
  44.411 ms (5 allocations: 24.78 MiB)

julia> @btime findall2(>(0.95), x);
  28.501 ms (5 allocations: 13.34 MiB)

julia> @btime findall2(==(0.5), x);
  23.160 ms (4 allocations: 9.54 MiB)

# Using broadcast (this PR)
julia> @btime findall3(>(0.5), x);
  13.709 ms (8 allocations: 39.34 MiB)

julia> @btime findall3(>(0.8), x);
  10.507 ms (8 allocations: 16.44 MiB)

julia> @btime findall3(>(0.95), x);
  9.702 ms (8 allocations: 5.00 MiB)

julia> @btime findall3(==(0.5), x);
  7.945 ms (7 allocations: 1.20 MiB)

The `findall` fallback is quite slow when predicate is a small function compared with generating a logical index using `broadcast` and calling `findall` on it to compute integer indices. The gain is most visible when predicate is true for a large proportion of entries, but it's there even when all of them are false. The drawback of this approach is that it requires allocating a vector of `length(a)/8` bytes whatever the number of returned indices.

musm · 2020-12-15T20:04:34Z

base/array.jl

@@ -2404,7 +2408,8 @@ function findall(pred::Fix2{typeof(in),<:Union{Array{<:Real},Real}}, x::Array{<:
 end
 # issorted fails for some element types so the method above has to be restricted
 # to element with isless/< defined.
-findall(pred::Fix2{typeof(in)}, x::Union{AbstractArray, Tuple}) = _findin(x, pred.x)
+findall(pred::Fix2{typeof(in)}, x::AbstractArray) = _findin(x, pred.x)


unrelated change?

I think it's required to avoid an ambiguity, since ::Fix2 is more specific than ::Function, but ::AbstractArray is more specific than ::Union{AbstractArray, Tuple}.

musm · 2020-12-15T20:10:57Z

This looks good to me.
There's only one scenario where the only benefit of the current method is fewer allocations at the expense of a 2x slow-down in speed. That sounds like a good tradeoff to me.

@mbauman @timholy want to also sign off / review ?

mbauman · 2020-12-15T22:25:10Z

This is rather surprising to me — I suppose it's entirely due to the ability to sum logical arrays and @inbounds over a pre-determined size array. I appreciate that this is better performance in many cases, but I dislike how this is harder to opt-out of than it is to opt-into. I suppose, though, that — by definition — you're doing vectorized allocat-y Julia by using findall... and if this were truly chasing peak allocation-free performance, you'd do the iterative find loop.

So with that rationale, I'll give this a ✅. I'll give it a bit more time, but barring further comments let's merge it tomorrow.

nalimilan · 2020-12-18T12:56:33Z

Thanks!

garrison · 2021-04-21T01:05:23Z

This change broke UniqueVectors.jl. My current plan is to issue a new release to fix it, but I am wondering: could there be other fallout throughout the package ecosystem? Seems somewhat unlikely, but perhaps worth considering/checking.

nalimilan · 2021-04-21T09:02:49Z

Usually before releasing a new version all package tests are run against it to detect any problems in advance, so I guess we'll find out at that point.

mbauman · 2021-04-21T14:01:48Z

Would we avoid the ambiguity if it's findall(testf::Function, A::Union{AbstractArray, Tuple})? Broadcasting for tuples is likely faster too.

While we try to avoid ambiguity changes like this, it's really hard to avoid and is easy to fix.

garrison · 2021-04-21T21:29:14Z

Would we avoid the ambiguity if it's findall(testf::Function, A::Union{AbstractArray, Tuple})?

Do you mean if we were to replace the signature findall(testf::Function, A::AbstractArray) with findall(testf::Function, A::Union{AbstractArray, Tuple})? The method signature in UniqueVectors that is now ambiguous (following the change in this pull request) is findall(p::Base.Fix2{typeof(in),<:AbstractUniqueVector}, a::Union{Tuple, AbstractArray}). I believe that even if the change you are suggesting is made, the UniqueVector method would still be ambiguous with findall(pred::Fix2{typeof(in)}, x::AbstractArray) and findall(pred::Fix2{typeof(in)}, x::Tuple).

An alternative approach (and one which you might actually be implying) is to replace the three signatures of findall introduced in this PR with a single signature: the one you mentioned, findall(testf::Function, A::Union{AbstractArray, Tuple}). Then this method could e.g. call an inner _findall method, and the dispatching could be performed one level deeper. I think this would result in unambiguous methods, but haven't considered it carefully or tested it.

While we try to avoid ambiguity changes like this, it's really hard to avoid and is easy to fix.

Do you mean to fix it by adjusting julia before the release or to fix it in packages? I am on board either way. If there is no other fallout in the package ecosystem, then the above solution is almost certainly overkill -- the UniqueVectors package is probably the right place to fix it.

mbauman · 2021-04-21T22:03:33Z

Yes, I meant the latter suggestion on both counts. Sorry for the brevity!

The `findall` fallback is quite slow when predicate is a small function compared with generating a logical index using `broadcast` and calling `findall` on it to compute integer indices. The gain is most visible when predicate is true for a large proportion of entries, but it's there even when all of them are false. The drawback of this approach is that it requires allocating a vector of `length(a)/8` bytes whatever the number of returned indices.

nalimilan mentioned this pull request Aug 24, 2020

filter compatibility with AcceleratedArrays optimized methods JuliaData/DataFrames.jl#2381

Open

nalimilan mentioned this pull request Sep 3, 2020

Add more findall benchmarks JuliaCI/BaseBenchmarks.jl#258

Merged

nalimilan requested a review from timholy September 29, 2020 13:55

nalimilan requested review from mbauman and StefanKarpinski October 7, 2020 15:58

musm reviewed Dec 15, 2020

View reviewed changes

mbauman approved these changes Dec 15, 2020

View reviewed changes

musm added domain:arrays [a, r, r, a, y, s] domain:search & find The find* family of functions performance Must go faster labels Dec 16, 2020

mbauman merged commit d20ca48 into master Dec 18, 2020

mbauman deleted the nl/findall branch December 18, 2020 12:28

garrison mentioned this pull request Apr 21, 2021

Ambiguity error on julia 1.7 (master) garrison/UniqueVectors.jl#14

Closed

jakobnissen mentioned this pull request Sep 10, 2021

Optimize findall(f, ::AbstractArray{Bool}) #42202

Merged

N5N3 mentioned this pull request Nov 16, 2021

[base] fix generic findall #43089

Merged

jakobnissen mentioned this pull request Jul 22, 2023

Revert optimisations to findall(f, ::AbstractArray{Bool}) #49831

Draft

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `findall` faster for AbstractArrays #37177

Make `findall` faster for AbstractArrays #37177

nalimilan commented Aug 24, 2020

musm Dec 15, 2020

nalimilan Dec 15, 2020

musm commented Dec 15, 2020

mbauman commented Dec 15, 2020

nalimilan commented Dec 18, 2020

garrison commented Apr 21, 2021

nalimilan commented Apr 21, 2021

mbauman commented Apr 21, 2021

garrison commented Apr 21, 2021

mbauman commented Apr 21, 2021

Make findall faster for AbstractArrays #37177

Make findall faster for AbstractArrays #37177

Conversation

nalimilan commented Aug 24, 2020

musm Dec 15, 2020

Choose a reason for hiding this comment

nalimilan Dec 15, 2020

Choose a reason for hiding this comment

musm commented Dec 15, 2020

mbauman commented Dec 15, 2020

nalimilan commented Dec 18, 2020

garrison commented Apr 21, 2021

nalimilan commented Apr 21, 2021

mbauman commented Apr 21, 2021

garrison commented Apr 21, 2021

mbauman commented Apr 21, 2021

Make `findall` faster for AbstractArrays #37177

Make `findall` faster for AbstractArrays #37177