Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement count using mapreduce #34048

Merged
merged 1 commit into from
May 2, 2020
Merged

Conversation

stev47
Copy link
Contributor

@stev47 stev47 commented Dec 8, 2019

This creates the same calling interface for count as for e.g. sum,
namely allowing the dims keyword.
The implementation is also shorter than before without sacrificing
performance.
mapreduce with add_sum may even yield performance benefits through
chunking, though this was not observed in simple tests.

@stev47
Copy link
Contributor Author

stev47 commented Dec 9, 2019

buildbot/tester_win32 fail is unrelated

base/reducedim.jl Outdated Show resolved Hide resolved
@ararslan ararslan added needs compat annotation Add !!! compat "Julia x.y" to the docstring needs tests Unit tests are required for this change labels Dec 11, 2019
@stev47
Copy link
Contributor Author

stev47 commented Dec 12, 2019

added compat annotations and tests.
Question: Do we really want count(Any[]) == 0 as count(Any[1]) throws?

@stev47
Copy link
Contributor Author

stev47 commented Dec 14, 2019

buildbot failures are unrelated

@stev47 stev47 force-pushed the feature/count-reduce branch 2 times, most recently from 88cb5df to f976b9a Compare February 2, 2020 11:21
@StefanKarpinski
Copy link
Sponsor Member

Would it make sense for @tkf to review since he overhauled the mapreduce machinery recently?

@tkf
Copy link
Member

tkf commented Feb 3, 2020

Actually, I tweaked only mapfoldl so far. But this patch LGTM anyway.

base/reducedim.jl Outdated Show resolved Hide resolved
@stev47
Copy link
Contributor Author

stev47 commented Apr 6, 2020

this seems ready, freebsd test failure looks unrelated

Comment on lines 389 to 390
count(A::AbstractArray; dims=:) = count(identity, A, dims=dims)
count(f, A::AbstractArray; dims=:) = mapreduce(_bool(f), add_sum, A, dims=dims, init=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be nice to use A::AbstractArrayOrBroadcasted instead (with some tests).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, I changed it. You already added a test for Broadcasted and count() in 2f90dde.
Do you still feel we need more?

base/reducedim.jl Outdated Show resolved Hide resolved
test/reducedim.jl Outdated Show resolved Hide resolved
test/reducedim.jl Outdated Show resolved Hide resolved
@tkf tkf removed the needs compat annotation Add !!! compat "Julia x.y" to the docstring label May 1, 2020
This creates the same calling interface for `count` as for other
mapreduce-type functions like e.g. `sum`, namely allowing the `dims`
keyword.
The implementation itself is shorter than before without sacrificing
performance.
More detailed documentation for `count` was added too.
@stev47
Copy link
Contributor Author

stev47 commented May 1, 2020

test fail on linuxaarch64 is worrying but seems unrelated

@tkf tkf removed the needs tests Unit tests are required for this change label May 1, 2020
Copy link
Member

@tkf tkf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tkf tkf added the status:forget me not PRs that one wants to make sure aren't forgotten label May 1, 2020
@tkf tkf merged commit 301db97 into JuliaLang:master May 2, 2020
@tkf tkf removed the status:forget me not PRs that one wants to make sure aren't forgotten label May 2, 2020
@Jutho
Copy link
Contributor

Jutho commented May 6, 2020

I am not quite sure how this is possible, given the minimal code change of this PR, but git bisect points to this commit as the culprit for the following change in type inference:

using LinearAlgebra, Test
X = [randn(n,n) for n = 1:5];
f(iter, init) = mapreduce(sum, +, iter; init = init);
g(iter, init) = mapreduce(norm, +, iter; init = init);
@inferred f(X, 0.)
@inferred g(X, 0.)

f is always inferred before and after this commit, g fails to infer after this commit. However, type inference acts very strangely. In particular, it is important that you run this on a fresh Julia session. In the slightly more elaborate example:

using LinearAlgebra, Test
X = [randn(n,n) for n = 1:5]

f1(iter, init) = reduce(+, sum(b) for b in iter; init = init);
f2(iter, init) = mapreduce(sum, +, iter; init = init);
g1(iter, init) = reduce(+, norm(b) for b in iter; init = init);
g2(iter, init) = mapreduce(norm, +, iter; init = init);
h1(iter, p, init) = reduce(+, norm(b,p) for b in iter; init = init);
h2(iter, p, init) = mapreduce(b->norm(b,p), +, iter; init = init);

@inferred f1(X, 0.)
@inferred f2(X, 0.)
@inferred g1(X, 0.)
@inferred g2(X, 0.)
@inferred h1(X, 2, 0.)
@inferred h2(X, 2, 0.)

f1 and f2 are always inferred, but out of g1, g2, h1 and h2, the one that is ran first is not inferred, and then the others are.

So I guess the norm implementation also plays some role in this, but I am still trying to understand the actual reason for this happening.

@Jutho
Copy link
Contributor

Jutho commented May 6, 2020

I expect the lines

elseif p == 0
        return typeof(float(norm(first(itr))))(count(!iszero, itr))

in norm in LinearAlgebra/src/generic.jl have something to do with this, though norm itself with arbitrary p is perfectly inferred and type stable.

@stev47
Copy link
Contributor Author

stev47 commented May 6, 2020

It seems to work if you add type assertions:

        return typeof(float(norm(first(itr))))(count(!iszero, itr))::typeof(float(norm(first(itr))))

Also for me only g1 is failing, h1 works fine. Maybe the compiler is giving up on inference because stuff is getting too deeply nested with the mapreduce count being called within a mapreduce operation, I'm not sure though.

@Jutho
Copy link
Contributor

Jutho commented May 6, 2020

Regarding failing and working, it is the one which is inferred firstly that fails. So if you change the order of the @inferred statements in my code snippet, such that h1 is ran/inferred first in a fresh Julia session, that one fails, but then g1 works fine. That's what I find extremely surprising/disturbing?

@tkf
Copy link
Member

tkf commented May 6, 2020

It seems to work if you add type assertions:

Maybe we should use convert? There is less guarantee for the type constructor to return the actual type.

That's what I find extremely surprising/disturbing?

Yeah, I agree. Maybe some kind of caching bug? I think it is worth opening a dedicated issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants