Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are isbits-Unions isbits? #23567

Closed
quinnj opened this issue Sep 2, 2017 · 8 comments · Fixed by #25496
Closed

Are isbits-Unions isbits? #23567

quinnj opened this issue Sep 2, 2017 · 8 comments · Fixed by #25496

Comments

@quinnj
Copy link
Member

quinnj commented Sep 2, 2017

Noticed in the current master bug:

julia> t = Vector{Union{Float64, Void}}(5)
5-element Array{Union{Void, Float64},1}:
 nothing
 nothing
 nothing
 nothing
 nothing

julia> b = collect(Union{Float64, Void}, 1.0:3.0)
3-element Array{Union{Void, Float64},1}:
 1.0
 2.0
 3.0

julia> copy!(t, 2, b)
5-element Array{Union{Void, Float64},1}:
 0.0
 1.0
 2.0
  nothing
  nothing

The issue is that we currently go thru the codepath in array.jl

function unsafe_copy!(dest::Array{T}, doffs, src::Array{T}, soffs, n) where T
    if isbits(T)
        unsafe_copy!(pointer(dest, doffs), pointer(src, soffs), n)
    else
        ccall(:jl_array_ptr_copy, Void, (Any, Ptr{Void}, Any, Ptr{Void}, Int),
              dest, pointer(dest, doffs), src, pointer(src, soffs), n)
    end

but isbits(Union{Float64, Void}) is false; so we hit jl_array_ptr_copy. Now, I did had a hook in jl_array_ptr_copy to intercept isbits-Union arrays and handle them specially, the problem is it currently only works when you happen to be copying to the 1st element of the dest array. This is because we only pass dest_p, a pointer and not an index, to jl_array_ptr_copy, so we don't actually know where to copy selector bytes from src to dest.

There a couple of ways to address this:

  • go ahead and make isbits(Union{Float64, Void}) == true, this would involve making sure we're careful in a # of places in Base where we're currently checking isbits(T) on arrays and doing optimizations (i.e. vcat, unsafe_copy, flipdim). The biggest ? for me in doing this is how this propagates thru the rest of the system, in particular, inference.jl (which has quite a few calls to isbits(T).). Overall, I think it would be more correct, but it does seem a bit worrisome not quite knowing the full extent of this kind of change. I can certainly do more research here if we think it's the best way to try out first.
  • make another method jl_array_ptr_copy2, which also takes ind arguments so we can correctly adjust when copying isbits-Union arrays.

Ultimately, I think we should make isbits-Unions isbits, because we're essentially doing that everywhere in src/, so we might as well be consistent in Base.

cc: @JeffBezanson, @vtjnash

@yuyichao
Copy link
Contributor

yuyichao commented Sep 2, 2017

The whole point of specializing this C function only for pointer arrays is that the compiler can do all the type checks instead of doing it at runtime so whether or not we make isbits-Union isbits the special case should be moved out of the C function. It was added as a C function only because calling memcpy on pointer arrays is invalid. This is not needed for isbits Union array so it (isbits Union) shouldn't need a C implementation. (You can implement it in C, of course, if there's nothing to be specialized).

In any case, it seems that you have to implement a separate branch for isbits-Union here since it needs a completely different fast path (two copies, one for content, one for tag) and that branch should be implemented in julia. Therefore, fixing this issue is independent with the value of isbits(Union{Float64, Void}).

@quinnj
Copy link
Member Author

quinnj commented Sep 4, 2017

Update: took a stab at making isbits Unions isbits and it seems pretty hairy. While we have optimizations for isbits Union array eltypes and fieldtypes, we still have jl_isbits(::Union) == false; allowing that would have pretty far-reaching effects as it gets used all over codegen/intrinsics/jltypes and more. Similarly w/ Base, as I mentioned above, the effects on inference.jl and other places just seem like changing the system too drastically.

The alternative approach seems much more sane, implemented as the PR #23577.

I tried to audit other places in Base where we're branching on isbits(T) and most others seem ok; for example, there are the optimization branches in nullable.jl that should be safe for isbits Unions, but doesn't seem like that big a deal to miss out on. Similarly in SharedArray, we could technically allow isbits Union arrays, but not super critical. The references I wasn't quite sure on where in refpointer.jl

function unsafe_convert(P::Type{Ptr{T}}, b::RefValue{T}) where T
    if isbits(T)
        return convert(P, pointer_from_objref(b))
    elseif isleaftype(T)
        return convert(P, pointer_from_objref(b.x))
    else
        # If the slot is not leaf type, it could be either isbits or not.
        # If it is actually an isbits type and the type inference can infer that
        # it can rebox the `b.x` if we simply call `pointer_from_objref(b.x)` on it.
        # Instead, explicitly load the pointer from the `RefValue` so that the pointer
        # is the same as the one rooted in the `RefValue` object.
        return convert(P, pointerref(Ptr{Ptr{Void}}(pointer_from_objref(b)), 1, 0))
    end
end

There were also a few cases in deepcopy/serialize, but I thought we had handled those already in the /src code changes. If anyone has any insight on these cases or other places we need to change, I'm happy to figure out what needs to happen.

@yuyichao
Copy link
Contributor

yuyichao commented Sep 4, 2017

unsafe_convert

I think it can share the isbits case here. That should give it a pointer to the "boxed" object.

@quinnj
Copy link
Member Author

quinnj commented Sep 4, 2017

So we'd have

function unsafe_convert(P::Type{Ptr{T}}, b::RefValue{T}) where T
    if isbits(T) || isbitsunion(T)
        return convert(P, pointer_from_objref(b))
    elseif isleaftype(T)
        return convert(P, pointer_from_objref(b.x))
    else
        # If the slot is not leaf type, it could be either isbits or not.
        # If it is actually an isbits type and the type inference can infer that
        # it can rebox the `b.x` if we simply call `pointer_from_objref(b.x)` on it.
        # Instead, explicitly load the pointer from the `RefValue` so that the pointer
        # is the same as the one rooted in the `RefValue` object.
        return convert(P, pointerref(Ptr{Ptr{Void}}(pointer_from_objref(b)), 1, 0))
    end
end

?

@yuyichao
Copy link
Contributor

yuyichao commented Sep 4, 2017

Yes

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 4, 2017

We should probably work towards deprecating isbits, as it no longer will provide a meaning prediction of layout in the next release. The predicates of datatype_pointerfree and alloc_inline (possible others too?) can replace it. The exact concept represented by isbits is perhaps most nearly is_pod (plain old data) and should perhaps be preserved as executable documentation of the stable c-compatible ABI.

@vtjnash vtjnash mentioned this issue Sep 4, 2017
@quinnj
Copy link
Member Author

quinnj commented Sep 4, 2017

@vtjnash, I had a similar concern looking back over the original isbitsunion definition (performance wise). Would it be better to define it like

isbitsunion(u::Union) = isbitsunion(u.a) & isbitsunion(u.b)
isbitsunion(x) = isbits(x)

Still probably not quite as performant as your jl_is_layout_inline but pretty close?

@quinnj
Copy link
Member Author

quinnj commented Sep 5, 2017

So it looks like my revised definition above is about 22% faster, which is decent, but not quite as fast as I was expecting. So then I decided to just call the same function we call on the backend (jl_islayout_inline) and that was loads faster. See perf details here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants