Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement accumulate and friends #702

Merged
merged 11 commits into from
Feb 18, 2020
Prev Previous commit
Next Next commit
Comment on why we use vcat
  • Loading branch information
tkf committed Feb 14, 2020
commit 3db20efb479420c44ffef26e4a846ad92cde6cb5
2 changes: 2 additions & 0 deletions src/mapreduce.jl
Original file line number Diff line number Diff line change
Expand Up @@ -309,6 +309,8 @@ _valof(::Val{D}) where D = D
(init = (similar_type(a, Union{}, Size(0))(), init),),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I noticed here is that for length-0 input, the result has eltype of Union{}. I can see why, but in other cases we do try to to have a "sensible" output eltype for convenience, eg SA{Int}[] .+ SA{Int}[] preserves the Int eltype via some hopefully-not-dubious trickery (#664). Is it possible to do something similar here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I noticed here is that for length-0 input, the result has eltype of Union{}.

I don't think it's possible to solve this in general as the output value depends on the function op. I think another (somewhat) sensible output is to use the eltype of input array. But this is not always the valid answer; e.g., accumulate(//, [1, 2]) (note that Base's accumulate(//, Int[]) magically returns Rational{Int64}[] as it uses promote_op).

As all static arrays (including MArray) are shape-immutable, I don't think it is so harmful to return Union{} element type.

My preference is:

  1. Union{}
  2. promote_op(op, eltype(input), eltype(input))
  3. eltype(input)

Copy link
Member

@c42f c42f Feb 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it's possible to solve this in general as the output value depends on the function op.

This is what #664 hacks around for map by using return_type 😬 (but it's no worse than Base which does the same thing for broadcast and collect on empty containers). I've never been quite sure that's the right choice, but if it can't be made to work or is otherwise terrible I'm ok with Union{} :-) Prior to #664 we used Union{} for map and people mostly seemed ok with it.

Copy link
Member Author

@tkf tkf Feb 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see. I misread the last half of your first comment.

But I think one difficulty of accumulate compared to map is that output "eltype" is kind of ill-defined (or maybe it's better to say length-dependent). It's conceivable to have "type sequence" like this

op(::A, ::A) :: B
op(::B, ::A) :: C
op(::C, ::A) :: D
op(::D, ::A) :: D  # finally "stabilized"

But, even though B, C, D are all different types, promote_type(B, C, D) may return type E that is a concrete type. It is also easy to have "oscillatory" output:

osc_op(::Nothing, ::Any) = missing
osc_op(::Missing, ::Any) = nothing

So, maybe we should use something like this?

function f(op, acc, x)
    T = typeof(acc)
    while true
        acc = op(acc, x)
        T = promote_type(T, typeof(acc))
        non_existing_value && return T
        # `non_existing_value` modeling truncation arbitrary-sized input.
    end
end

return_type(f, Tuple{typeof(op), typeof(init), eltype(a)})

Impressively, julia can figure out some non-trivial examples like Base._return_type(f, Tuple{typeof(+), Int, Union{Int,Missing}}) and Base._return_type(f, Tuple{typeof(osc_op), Missing, Int}).

I'm not sure if we need to go this far, though. Base is not doing this for accumulate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it's confusing to know what the empty case should do here and it's worse than map. With your fix we have

julia> accumulate((a,b)->(a+b)/2, SA[1,2])
2-element SArray{Tuple{2},Float64,1,2} with indices SOneTo(2):
 1.0
 1.5

julia> accumulate((a,b)->(a+b)/2, SA[1])
1-element SArray{Tuple{1},Int64,1,1} with indices SOneTo(1):
 1

julia> accumulate((a,b)->(a+b)/2, SA{Int}[])
0-element SArray{Tuple{0},Int64,1,0} with indices SOneTo(0)

It seems like we must make a somewhat arbitrary choice for the length-0 case with init. You've got

julia> accumulate((a,b)->(a+b)/2, SA{Int}[1], init=0)
1-element SArray{Tuple{1},Float64,1,1} with indices SOneTo(1):
 0.5

julia> accumulate((a,b)->(a+b)/2, SA{Int}[], init=0)
0-element SArray{Tuple{0},Float64,1,0} with indices SOneTo(0)

Overall do you feel like this is an improvement on using Union? I do like that cumsum now preserves numeric eltypes and I feel like that could be helpful for users for the same reason that "fixing" map was.

We do have oddities like

julia> cumsum(Int8[1])
1-element Array{Int64,1}:
 1

julia> cumsum(SA{Int8}[1])
1-element SArray{Tuple{1},Int8,1,1} with indices SOneTo(1):
 1

julia> cumsum(SA{Int8}[1,2])
2-element SArray{Tuple{2},Int64,1,2} with indices SOneTo(2):
 1
 3

I think that inconsistency between here and Base might be harmless though, and it's certainly not clear how to "fix" it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall do you feel like this is an improvement on using Union?

That's a tough question... 😅 Personally, I think we should be using Union{} eltype everywhere when it's not possible to figure out eltype without relying on the compiler. However, we don't have a good toolset and coding convention for dealing with Union{} eltype ("mutate-or-widen" style, even at the surface API level).

But I think the consistency of the API within a package and with respect to Base matters. As map is already using return_type, overall, I think using return_type seems to be a decent solution here.

) do (ys, acc), x
y = rf(acc, x)
# Not using `push(ys, y)` here since we need to widen element type as
# we iterate.
(vcat(ys, SA[y]), y)
c42f marked this conversation as resolved.
Show resolved Hide resolved
end
dims === (:) && return first(results)
Expand Down