Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure immutable isbits Union fields get set correctly #23367

Merged
merged 6 commits into from
Sep 7, 2017
Merged

Conversation

quinnj
Copy link
Member

@quinnj quinnj commented Aug 20, 2017

@vtjnash, this isn't quite right yet. When constructed, the selector byte and any non-union fields aren't getting set correctly.

I'm kind of just taking a stab here, and I'm guessing that my use of fval_info and dest are not quite right, hence the non-working solution here. For example, when I run

struct BitsUnion
    x::Union{Int64, Float64}
end
BitsUnion(5.0)

the fval_info is

(lldb) call jl_(fval_info.constant)
#<null>
(lldb) call jl_(fval_info.typ)
Float64
(lldb) p fval_info.isimmutable
(const bool) $0 = false

I'm not sure why the fval_info.constant isn't set, but that causes problems when we try to compute_tindex_unboxed later against BitsUnion.

I won't be able to do much else on this for the rest of the day, but any pointers/tips/direction you or @yuyichao could provide would be much appreciated.

fix #23351

@kshyatt kshyatt added the domain:types and dispatch Types, subtyping and method dispatch label Aug 20, 2017
@quinnj
Copy link
Member Author

quinnj commented Aug 21, 2017

Ok, a little more sleuthing:

  • Turns out when you call julia_type_to_llvm on Union{Int64, Float64}, the resulting Type * has isPointerTy() == true
  • I believe that's why the value setting isn't happening correctly, since emit_unbox thinks it's dealing w/ a pointer type when it should be unboxing inline as a Float64 or Int64

I'm not quite sure what the right solution here is though. Modify emit_unbox? Lie to LLVM and call emit_unbox w/ the largest Union element?

I feel like there still might be another issue going on, because in the llvm code for BitsUnion(5.0), I still don't see a selector byte store for some reason. Would that be getting optimized out for some reason? Or "clobbered" somehow due to the Union-is-a-pointertype issue?

src/builtins.c Outdated
@@ -259,6 +259,10 @@ JL_CALLABLE(jl_f_sizeof)
jl_value_t *x = args[0];
if (jl_is_unionall(x) || jl_is_uniontype(x)) {
x = jl_unwrap_unionall(x);
size_t elsize = 0, al = 0;
int isinline = jl_islayout_inline(x, &elsize, &al);
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will have the curious property that T == S will not imply sizeof(T) == sizeof(S). I'm not sure if that's going to be an issue, but it does make be nervous. In particular, consider Union{Tuple{Int64}, Tuple{Int8}}, which doesn't include selector bits, vs. Tuple{Union{Int64, Int8}}, which throws an error.

@quinnj
Copy link
Member Author

quinnj commented Aug 21, 2017

Bump @vtjnash, I could really use your help/direction here.

@quinnj
Copy link
Member Author

quinnj commented Aug 23, 2017

Alright, added some tests for @vtjnash's help w/ the fix. Let's see what CI says. Anyone else want to take a look?

@quinnj
Copy link
Member Author

quinnj commented Aug 23, 2017

That's a lot of great green checkmarks; a shame that Travis is backed up so far.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Aug 23, 2017

Looks like Travis caught another bug though:

From worker 5:	compare_fields at /home/travis/build/JuliaLang/julia/src/builtins.c:94

(that line is assuming isbits_layout === isdatatype)

@quinnj
Copy link
Member Author

quinnj commented Aug 24, 2017

Mac failure is #23215; everything else looks all good.

@vtjnash vtjnash force-pushed the jq/bitsunions branch 6 times, most recently from 16863b9 to 7cf7580 Compare August 30, 2017 17:24
@vtjnash vtjnash changed the title [WIP] Ensure immutable isbits Union fields get set correctly Ensure immutable isbits Union fields get set correctly Aug 30, 2017
@Keno
Copy link
Member

Keno commented Sep 1, 2017

@vtjnash is this good to go?

depth);
jl_datatype_t *ft = (jl_datatype_t*)jl_field_type(vt, i);
if (jl_is_uniontype(ft)) {
uint8_t sel = ((uint8_t*)fld_ptr)[jl_field_size(vt, i) - 1];
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This operation should be abstracted.

@quinnj
Copy link
Member Author

quinnj commented Sep 2, 2017

I'm still seeing a segfault w/

struct A
    x::Union{Float64, Void}
end

getx(a::A) = a.x

getx(A(1.0))

@quinnj
Copy link
Member Author

quinnj commented Sep 2, 2017

Ok, looks like the segfault is coming from

data = ctx.builder.CreateSelect(isboxed_union,
-> 5661	                                    ctx.builder.CreateLoad(retvalinfo.gcroot),
   5662	                                    data);

when retvalinfo.gcroot, which is NULL, is accessed. Note that I'm running master merged w/ this branch locally, in case that makes a difference.

I also see retvalinfo.ispointer() == true if that makes a difference.

(lldb) p retvalinfo.ispointer()
(bool) $11 = true
(lldb) call jl_(retvalinfo.typ)
Union{Float64, Int64}
(lldb) p retvalinfo.gcroot
(llvm::Value *) $12 = 0x0000000000000000
(lldb)

I'm not really sure what needs to be done here though.

@Keno
Copy link
Member

Keno commented Sep 3, 2017

@quinnj You might try rebasing on top of #23571, which cleans up some of this code. I'll take a look if it still crashes.

@quinnj
Copy link
Member Author

quinnj commented Sep 4, 2017

That indeed seems to have fixed the problem @Keno. I've pushed this branch as rebased on top of #23571

@quinnj
Copy link
Member Author

quinnj commented Sep 4, 2017

Hmmm, travis linux 64-bit & julia freebsd ci both stalled during tests; I imagine unrelated? Have we seen those kinds of hangs elsewhere?

@nalimilan
Copy link
Member

Yes, I've seen a hang today in another PR.

When the new gc root placement pass was introduced, the .gcroot field
lost its original meaning and became mostly superfluous. However, the
codegen code for unions still used it to get a referenced to the boxed
part of the split union. Clean that up by introducing an explicit field
for that purpose and removing the .gcroot field.
@quinnj
Copy link
Member Author

quinnj commented Sep 5, 2017

Ok, rebased this again on top of @Keno 's new gcroot commit. Also added a test for the codgen-getfield crash I saw above. Let's see what CI says on all of this.

@quinnj
Copy link
Member Author

quinnj commented Sep 5, 2017

CI all looks good, just the one hang on 32-bit travis linux. I'll let @Keno & @vtjnash decide when this is good to go. I'll keep doing more testing on my end.

CI->getType(), "", CI);
ASCI->takeName(CI);
CI->replaceAllUsesWith(ASCI);
auto *ptr = new PtrToIntInst(CI->getOperand(0), CI->getType(), "", CI);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't valid by IR invariants (it'll complain on LLVM 5.0+). Why does this need to be an int?

Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what CI->getType() is. We can worry about LLVM 5 when we start supporting it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuyichao is running in that configuration AFAIK.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 4.0 already has non-integral pointer so this will complain there already.

I'm on 4.0 but I do test svn from time to time to make sure it's ready whenever Arch's package is updated.

Should be enough to just keep the old addrspacecast and add an ptrtoint on top?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is not addressed and it breaks LLVM 4.0 .

@@ -76,6 +76,8 @@ end
a = Base.cconvert(Ptr{LibGit2.StrArrayStruct}, p)
b = Base.unsafe_convert(Ptr{LibGit2.StrArrayStruct}, a)
@test p == convert(Vector{String}, unsafe_load(b))
@noinline gcuse(a) = a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should use Core.gcuse now.

@quinnj quinnj merged commit 1875957 into master Sep 7, 2017
@quinnj quinnj deleted the jq/bitsunions branch September 7, 2017 16:53
@StefanKarpinski
Copy link
Sponsor Member

❤️

@maleadt
Copy link
Member

maleadt commented Sep 7, 2017

This breaks llvmcall(fun::Ptr) pretty spectacularly:

$ gdb julia

(gdb) break ccall.cpp:1083
Breakpoint 1 (ccall.cpp:1083) pending.

(gdb) run
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: https://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 0.7.0-DEV.1704 (2017-09-07 16:53 UTC)
 _/ |\__'_|_|_|\__'_|  |  Commit 1875957dec* (0 days old master)
|__/                   |  x86_64-unknown-linux-gnu

julia> x() = Base.llvmcall(C_NULL, Void, Tuple{Ptr{Int}, Int}, C_NULL, 1)
x (generic function with 1 method)

julia> x()

Thread 1 "julia" hit Breakpoint 1, emit_llvmcall (ctx=..., args=0x7fffeb1c6750, nargs=5) at /home/tbesard/Work/Julia-CUDA/julia-dev/src/ccall.cpp:1083
1083            assert(isPtr);

(gdb) call argtypes[0]->dump()
i64

(gdb) call argtypes[1]->dump()
i64

As seen on LLVM.jl, bisected to 1875957.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Sep 7, 2017

llvmcall is still experimental (unstable) anyways

@maleadt
Copy link
Member

maleadt commented Sep 8, 2017

llvmcall is still experimental (unstable) anyways

Yeah sure stuff breaks on master all the time, just reporting the fact.

@maleadt
Copy link
Member

maleadt commented Sep 8, 2017

OK, so looking at the actual commits this seems fully intended. Is this anonymization of pointers to native-width integers going to be the default from now on?
Even though llvmcall is unstable, this seems unnecessarily breaking..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:types and dispatch Types, subtyping and method dispatch
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Memory corruption in Pair
9 participants