Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Functions applied to an empty skipmissing should return missing. #35194

Closed
pdeffebach opened this issue Mar 21, 2020 · 2 comments
Closed

Functions applied to an empty skipmissing should return missing. #35194

pdeffebach opened this issue Mar 21, 2020 · 2 comments

Comments

@pdeffebach
Copy link
Contributor

julia> mean(skipmissing([missing, missing]))
ERROR: MethodError: reduce_empty(::typeof(Base.add_sum), ::Type{Union{}}) is ambiguous. Candidates:
  reduce_empty(::typeof(Base.add_sum), ::Type{T}) where T<:Union{Int16, Int32, Int8} in Base at reduce.jl:231
  reduce_empty(::typeof(Base.add_sum), ::Type{T}) where T<:Union{UInt16, UInt32, UInt8} in Base at reduce.jl:232
Possible fix, define
  reduce_empty(::typeof(Base.add_sum), ::Type{Union{}})
Stacktrace:
 [1] mapreduce_empty(::typeof(identity), ::Function, ::Type) at ./reduce.jl:247
 [2] mapreduce_empty_iter(::Function, ::Function, ::Base.SkipMissing{Array{Missing,1}}, ::Base.HasEltype) at ./reduce.jl:254
 [3] mean(::typeof(identity), ::Base.SkipMissing{Array{Missing,1}}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/Statistics/src/Statistics.jl:60
 [4] mean(::Base.SkipMissing{Array{Missing,1}}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.3/Statistics/src/Statistics.jl:42
 [5] top-level scope at REPL[110]:1

People use skipmissing to get away from worrying about missings. They shouldn't have to worry about whether or not the vector they are working with is full of missing values.

xref #35050. I would consider this part of a larger set of PRs I have in mind to make working with missings easier.

@pdeffebach
Copy link
Contributor Author

pdeffebach commented Mar 21, 2020

This would probably be very easy to fix for all functions that use reduce, mapreduce or foldl. But I can't think of anything that would work on functions that just call for i in itr etc. But it's not hopeless:

julia> x = Union{Int, Missing}[missing, missing]
2-element Array{Union{Missing, Int64},1}:
 missing
 missing

julia> std(skipmissing(x))
NaN

julia> std(Int[])
NaN

My above example of [missing, missing] is bad, it's better to write mean(Union{Int, Missing}[missing, missing]) since there is dispatch based on the element type. For instance

julia> mean(skipmissing(x))
NaN

It's not obvious why mean of an empty iterator returns NaN. Imo it should return zero(eltype(itr)) but I guess there are larger concerns I don't undestand. The relevant line is here

        return Base.mapreduce_empty_iter(f, Base.add_sum, itr,
                                         Base.IteratorEltype(itr)) / 0

@pdeffebach
Copy link
Contributor Author

X-ref #28777. This issue can be closed I think as i think the current behavior is as intended and earlier devs have thought of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant