Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Nullable lifting infrastructure #18758

Closed
wants to merge 1 commit into from

Conversation

davidagold
Copy link
Contributor

@davidagold davidagold commented Oct 1, 2016

This PR introduces a generic lifting framework based on the "higher-order lifting" approach. Once this is rebased on top of #18484 it should allow for the following:

julia> g(x::Int) = x
g (generic function with 1 method)

julia> _g = lift(g)
Lifted{#g}(g,Dict{Tuple{Vararg{DataType,N}},DataType}())

julia> _g(Nullable(1))
Nullable{Int64}(1)

julia> _g(Nullable{Int}())
Nullable{Int64}()

julia> lift(+, Int, Nullable(1), 2)
Nullable{Int64}(3)

julia> lift(+, Int, Nullable(), 2)
Nullable{Int64}()

That is, lift(f::F) returns a Lifted{F}, which, when called on arguments xs..., lowers to lift(f, U, xs...), where U is chosen by type inference. We include the return type parameter U as an argument to lift for cases when U is invariant over many applications of lift, e.g. when mapping some f over a tightly typed NullableArray. Having a Lifted type will allow us to dispatch such higher-order functions on whether or not an f is lifted. This in turn will allow us to define, say,

map{F}(f::Lifted{F}, X::NullableArray)

in a way that takes advantage of the aforementioned invariance.

This PR also implements three-valued logic semantics for lifted & and |:

julia> _and = lift(&)
Lifted{Base.#&}(&,Dict{Tuple{Vararg{DataType,N}},DataType}())

julia> _and(Nullable{Bool}(), Nullable(false))
Nullable{Bool}(false)

This PR could perhaps use some fine-tuning with respect to the use of splatting and whether or not to @inline the lift(f, U, xs...) definitions.

cc: @johnmyleswhite @nalimilan @quinnj @davidanthoff @JeffBezanson @TotalVerb @vchuravy

EDIT: Tests should pass once this is rebased.

@TotalVerb
Copy link
Contributor

I would honestly rather this be a method of broadcast. Conceptually, it makes a lot of sense.

@davidagold
Copy link
Contributor Author

I would honestly rather this be a method of broadcast.

@TotalVerb I'm not opposed to this, as long as there is then a method broadcast(f, U::DataType, x), since having the return type parameter argument U in lift(f, U, x) is very handy when mapping over a NullableArray (really, any array with eltype Nullable{T}). I don't think we want to end up iterating over some such container and calculating Core.Inference.return_type(f, Tuple{T}) at each iteration. Having a Lifted wrapper type would still allow for handy dispatch.

@TotalVerb
Copy link
Contributor

TotalVerb commented Oct 1, 2016

I suppose the issue is that there are two senses of broadcasting here: broadcasting over the nullables themselves (the lifting) and broadcasting the resulting operation over the array. It may be difficult to combine those into one operation, especially one as semantically dense as broadcast (which is quite difficult to understand already).

I really don't like the type "annotations" here. Couldn't inference be applied once, with Core.Inference.return_type(f, Tuple{eltype(xs).parameters[1]})? And even if applied to each element of the array, that should be a compile-time cost, and thus in reality only applied once for the program's duration.

@davidagold
Copy link
Contributor Author

That's where the Lifted wrapper may come in handy:

function broadcast{F}(_f::Lifted{F}, X, Y)
    ...
    for (x, y) in broadcasted_indices
        res[i] = broadcast(_f.f, U, x, y)
    end
    res
end

@davidagold
Copy link
Contributor Author

I really don't like the type "annotations" here. Couldn't inference be applied once, with Core.Inference.return_type(f, Tuple{eltype(xs).parameters[1]})?

Would you please elaborate? I'm not sure I understand.

And even if applied to each element of the array, that should be a compile-time cost, and thus in reality only applied once for the program's duration.

We'll need to get somebody who's familiar with these things on the line. My naive thinking was that calling Core.Inference.return_type would run type inference each time it's called. I suppose this all depends on whether or not the results of type inference are cached for each argument signature along with the compiled method.

@TotalVerb
Copy link
Contributor

For lifted(foo), I dislike the implementation using a cache. Consider this:


A naive implementation of lifted(foo) could be like this:

lifted(foo) = (x, y) -> isnull(x) || isnull(y) ? NULL : unsafe_lift(foo, x, y)

where NULL is Nullable{Union{}}() and unsafe_lift is a function that gets unsafely the values of x and y and foos them.

The issue with this function is that it is type-unstable, as it could either return Nullable{Union{}} or Nullable{T}, where T is the actual type of foo(x, y). Let's assume for a minute that it's safe to compute unsafe_lift(foo, x, y) always, again for simplicity. This is obviously not true, as many useful Nullable types have values that are not safe to simply get. But for sake of argument, I would argue the following is a simple and good implementation:

stable_ifelse{T,U}(b::Bool, x::Nullable{T}, y::Nullable{U}) =
    ifelse(b, Nullable{Union{T,U}}(x), Nullable{Union{T,U}}(y))
lifted(foo) = (x, y) -> stable_ifelse(isnull(x) || isnull(y), NULL, unsafe_lift(foo, x, y))

This works for certain values, and has good generated code as far as I can tell. It's based on a combination of promoting distinct nullable types to a stable one. Even better, there is no inference-dependent behaviour at all here.


To me, it's clear that for operations and types that will support it (null_safe_op as it's called), the above code is superior. Now to deal with the tricky case.

Now, for safety, we need to turn this select into a real branch. But we can't make a stable_if like we could with stable_ifelse, without using Core.Inference.return_type. Let's consider two distinct options:

  • Give up type stability of the function in this case. We must try to minimize inference-dependent behaviour, and Nullable{Union{}} really isn't so bad.
  • Use Core.Inference.return_type, but in a sensible way.

For the first option, the implementation is obvious.

For the second, I prefer a solution similar to what map and comprehensions do:

  • if there is a value, use the actual type, and do not try to infer anything. The standard example:
julia> f(x) = rand(Bool) ? 1 : 1.0
f (generic function with 1 method)

julia> map(f, [1])
1-element Array{Int64,1}:
 1

julia> map(f, [1])
1-element Array{Float64,1}:
 1.0
  • if there is no value, try to infer the type and use that.
julia> map(x -> 1, Int[])
0-element Array{Int64,1}

julia> map(f, Int[])
0-element Array{Union{Float64,Int64},1}

So to be consistent, we could do:

lifted2(foo) = (x, y) -> isnull(x) || isnull(y) ?
    Nullable{Core.Inference.return_type(foo, Tuple{typeof(x).parameters[1], typeof(y).parameters[1]})}() :
    unsafe_lift(foo, x, y)

But I would actually argue against consistency here, because if we are going to use the actual type in the real case, then we should avoid non-concrete parameters in Nullable. By that I mean that Nullable{Union{Float64,Int64}}() is of little use, since the inferred type cannot be stable anyway. Instead, I think it would be best to use the following rule:

  • if there is no value, try to infer the type, and use that if and only if it's concrete; otherwise use Union{}

Which we can implement as:

Base.@pure inferred_concrete_return_type{T,U}(f,::Type{T},::Type{U}) =
    let X = Core.Inference.return_type(f,Tuple{T,U})
        isleaftype(X) ? X : Union{}
    end

lifted2(foo) = function{T,U}(x::Nullable{T}, y::Nullable{U}) -> isnull(x) || isnull(y) ?
    Nullable{inferred_concrete_return_type(foo, Tuple{T,U})}() :
    unsafe_lift(foo, x, y)

As-is, this will not be type stable. However, I think it could be manually made type stable through inference special-casing. This function has some degree of utility outside of nullables, so a case could be made for such a special case.

@TotalVerb
Copy link
Contributor

@davidagold Inference being run twice is not a concern. However, the result of inference not actually making the function type stable is indeed a concern. If we can't make the function type stable, it's obviously better just to return NULL. I think that can be made faster with some optimizations in the compiler.

@davidanthoff
Copy link
Contributor

Why does this have to be in base, couldn't this stay in a package?

@TotalVerb
Copy link
Contributor

You mean the entire nullable infrastructure, or just the lifting part?

@davidanthoff
Copy link
Contributor

Just the stuff in this PR. It is not really clear to me where this would be used. I guess in NullableArrays, but couldn't it be in that package in that case?

isnull(x),
ifelse(
isnull(y),
Nullable{Bool}(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this uses a lot of vertical space, would be better with more than a single token per line

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Bool is safe to evaluate even when missing, you could also use the two-argument form of Nullable to make this more compact.

@andyferris
Copy link
Member

Is lift too generic of a name to put into base like this? I see the meaning is clear in the context of nullables, but why not use a name which is clear without that context? (e.g. liftnull, and LiftNulls, which I admit is slightly longer to write...)

@kshyatt kshyatt added the domain:missing data Base.missing and related functionality label Oct 3, 2016
@nalimilan
Copy link
Member

Unless we have another potential use for the term lift, I would keep it that way. Though we could probably start in a package, together with lifted operators. Anyway we'll need that for 0.5 support.

if isnull(x)
return Nullable{U}()
else
return Nullable{U}(f(unsafe_get(x)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really related, but I wonder whether the compiler isn't actually smart enough to generate the same code when using get. Is unsafe_get really needed except in very specific cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I use unsafe_get here because it is generic over Nullable and non-Nullable types, whereas get(x) for unqualified x is not defined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. That difference is a bit weird though. Maybe we should move to unwrap and have it work for any scalar.

@ViralBShah
Copy link
Member

lift is used in Reactive.jl too. Cc @shashi

@ViralBShah
Copy link
Member

I still think this is a good use of the word lift.

@andyferris
Copy link
Member

Some time ago I heard of talk of having "automatic" lifting for elementary operations like + and so on such that most generic functions would work correctly with Nullables - is this also still planned?

If so, is this PR is to allow us to address dispatch issues mostly? Or is it instead of automatic lifting entirely?

@shashi
Copy link
Contributor

shashi commented Oct 4, 2016

@ViralBShah in Reactive lift got renamed to map a while ago.

@nalimilan
Copy link
Member

Some time ago I heard of talk of having "automatic" lifting for elementary operations like + and so on such that most generic functions would work correctly with Nullables - is this also still planned?

If so, is this PR is to allow us to address dispatch issues mostly? Or is it instead of automatic lifting entirely?

@andyferris AFAIK there are no plans for automatic lifting of all functions. But NullableArrays.jl currently includes lifted operators, and I plan to open a PR again to move them into Base (following up #16988).

@martinholters
Copy link
Member

I know close to nothing of category theory, but it's still enough that I would expect lift(f) to yield a function that can be applied to whatever container-like thing, be it a Nullable or an Array. (And lift(lift(f)) to an Array of Nullables.) From that it would follow that Lifted{f}(xs...) should forward to broadcast(f, xs...) and broadcast should take care of unwrapping Nullables as suitable. (And then one can ask oneself whether lift is needed at all.)

BUT this gets extremely fishy if different container-likes get merged, like broadcast(+, Nullable(3), [1, 2]). Should this be Nullable([4, 5]) or [Nullable(4), Nullable(5)]? Or forbidden? The pragmatic way would probably be to have the infrastructure for Nullables stay separate and to reflect it in the name - e.g. liftnull. More theoretically pleasing, but not necessarily more useful, one could require the above to be written as broadcast(lift(+), Nullable(3), Nullable([1, 2])) or broadcast(lift(+), [Nullable(3)], [1, 2]), depending on the desired result.

@ViralBShah
Copy link
Member

I would think that for the case of arrays, what I would want is always Nullable([4, 5]).

@nalimilan
Copy link
Member

Should this be Nullable([4, 5]) or [Nullable(4), Nullable(5)]? Or forbidden? The pragmatic way would probably be to have the infrastructure for Nullables stay separate and to reflect it in the name - e.g. liftnull.

I think that's why @davidagold proposes lift here instead of doing lifting using broadcast as in #16961. The bikeshedding on the name really is a detail.

@davidagold
Copy link
Contributor Author

@TotalVerb Thank you for your thorough explanation. I really appreciate your taking the time to go through those thoughts, and I'd be interested to hear more about the compiler/inference work that would complement the present proposal.

@andyferris I'm not exactly sure anymore what precisely "automatic lifting" means. In this case it sounds like it means that f(x::Nullable{T}) "just works" given a method f(x::T). This PR won't provide for that behavior. Short of work in the compiler, I think the way to provide such behavior is actually to implement f(x::Nullable{T}).

But we could also consider "automatic lifting" restricted to a particular context, such as a macro invocation: @blah f(x) for x::Nullable{T}. Using such a macro it is possible to replace f(x) with lift(f, x). Now, if the entire point of the macro is to replace f(x) with lift(f, x), then one would hardly call this "automatic lifting". But if the macro has another purpose -- say, it is a querying macro that does other syntactic transformations, such as interpreting attributes (e.g. sepal_length) as column references (e.g. df[:sepal_length]) -- then one can throw in the replacement of f(x) with lift(f, x) for free. In that sense, we could think about "automatic lifting" as being a feature of querying macros, but not applicable to unadorned expressions f(x).

Given the ambiguity of the phrase "automatic lifting", I've found it more helpful to think in terms of "lifting by some method or other". So, the first approach, in which we define the method f(x::Nullable{T}), might be called something like "method extension lifting". The second approach -- the one implemented in this PR -- might be called something like "higher-order lifting". Then we could note that method extension lifting gives "proper" behavior to f(x::Nullable{T}) in all contexts, whereas higher-order lifting gives "proper" behavior to f(x::Nullable{T}) only in the context of a supporting macro.

@nalimilan
Copy link
Member

@TotalVerb's solution sounds good. Though, if returning Union{} in case of type instability requires some work in inference code, I'd rather start with the easier solution of returning the actual inferred union, and improve things later.

@davidanthoff
Copy link
Contributor

I'd like to raise the question again whether this has to be in base at this point, or whether this could live in a package. Given that this a) doesn't add additional methods to base functions for base types and b) is not used by anything else in base, it seems to me that this could well live in a package, mature there and once we have a better sense whether this is a good strategy or not it could still be put into base at a later point.

@davidagold
Copy link
Contributor Author

@davidanthoff I think you're probably right that this can mature in a package for now. But having this PR against Base has induced some very helpful discussion, so maybe it's worth keeping open and updating periodically.

@TotalVerb I'm sorry I didn't believe you re: inference being run twice:

julia> function lift(f, x)
           U = Core.Inference.return_type(f, Tuple{eltype(typeof(x))})
           if isnull(x)
               return Nullable{U}()
           else
               return Nullable(f(unsafe_get(x)))
           end
       end
lift (generic function with 1 method)

julia> f(x) = x
f (generic function with 1 method)

julia> @code_warntype lift(f, 1)
Variables:
  #self#::#lift
  f::#f
  x::Int64
  U::Type{Int64}

Body:
  begin 
      U::Type{Int64} = $(QuoteNode(Int64)) # line 3:
      unless $(QuoteNode(false)) goto 6 # line 4:
      return $(Expr(:new, Nullable{Int64}, false))
      6:  # line 6:
      return $(Expr(:new, Nullable{Int64}, true, :(x)))
  end::Nullable{Int64}

That's really neat. Does this mean that Core.Inference.return_type is lowered differently than a standard function call?

"""
immutable Lifted{F}
f::F
cache::Dict{Tuple{Vararg{DataType}}, DataType}
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This cache should be removed. The compiler already implements it.

NOTE: There are two exceptions to the above: `lift(|, Bool, x, y)` and
`lift(&, Bool, x, y)`. These methods both follow three-valued logic semantics.
"""
function lift(f, U::DataType, x)
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be Type instead of DataType.

@JeffBezanson
Copy link
Sponsor Member

Does this mean that Core.Inference.return_type is lowered differently than a standard function call?

No, it just means that inference can infer its return value.

@davidagold
Copy link
Contributor Author

Given the comments above, it seemed appropriate to do return typing inside lift. So, the method signature is now just, e.g. lift(f, x).

Should lift(f) return a lambda or a Lifted wrapper?

@KristofferC
Copy link
Sponsor Member

Nullable is no longer in Base

@KristofferC KristofferC closed this Jan 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:missing data Base.missing and related functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet