Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boxed tuples leaking into generated code #21121

Closed
maleadt opened this issue Mar 21, 2017 · 5 comments · Fixed by #21277
Closed

Boxed tuples leaking into generated code #21121

maleadt opened this issue Mar 21, 2017 · 5 comments · Fixed by #21277
Labels
compiler:codegen Generation of LLVM IR and native code kind:bug Indicates an unexpected problem or unintended behavior
Milestone

Comments

@maleadt
Copy link
Member

maleadt commented Mar 21, 2017

I've got an issue with generating a literal tuple from a macro → generated function, where in some cases the tuple in the AST is a boxed, breaking optimizations and (in my case) resulting in GPU incompatible code due to a literal pointer.

struct Wrapper
    tuple::NTuple{1,Int}
    Wrapper(tuple::NTuple{1,Int}) = new(tuple)
end

macro wrap(shape)
    return esc(:(generated_wrap(Val{$shape})))
end

function emit_wrap(shape)
    quote
        Base.@_inline_meta
        Wrapper($shape)
    end
end

@generated function generated_wrap{S}(::Type{Val{S}})
    return emit_wrap(S)
end

function kernel()
    wrapped = @wrap((42,))
    if 100 > wrapped.tuple[1]
        ccall("llvm.trap", llvmcall, Void, ())
    end
end
@code_lowered(kernel()) = CodeInfo(:(begin 
        nothing
        wrapped = (Main.generated_wrap)((Core.apply_type)(Main.Val, (Core.tuple)(42)))
        unless 100 > (Main.getindex)((Core.getfield)(wrapped, :tuple), 1) goto 7
        return $(Expr(:foreigncall, "llvm.trap", Void, svec(), :($(Expr(:llvmcall)))))
        7: 
        return
    end))


@code_typed(kernel()) = CodeInfo(:(begin 
        $(Expr(:inbounds, false))
        SSAValue(1) = (42,)
        $(Expr(:inbounds, :pop))
        unless (Base.slt_int)((Base.getfield)(SSAValue(1), 1)::Int64, 100)::Bool goto 13
        return $(Expr(:foreigncall, "llvm.trap", Void, svec(), :($(Expr(:llvmcall)))))
        13: 
        return
    end))=>Void


@code_llvm(kernel())
    define void @julia_kernel_68524() #0 !dbg !5 {
    top:
      %0 = load i64, i64* inttoptr (i64 140469690037328 to i64*), align 16
      %1 = icmp sgt i64 %0, 99
      br i1 %1, label %L13, label %if
    
    if:                                               ; preds = %top
      call void @llvm.trap()
      unreachable
    
    L13:                                              ; preds = %top
      ret void
    }

If instead generated_wrap returns e.g. Wrapper((42,)), code_lowered is identical but the other two are different:

@code_typed(kernel()) = CodeInfo(:(begin
        unless (Base.slt_int)(42, 100)::Bool goto 5
        return $(Expr(:foreigncall, "llvm.trap", Void, svec(), :($(Expr(:llvmcall)))))
        5: 
        return
    end))=>Void


@code_llvm(kernel())
    define void @julia_kernel_68518() #0 !dbg !5 {
    top:
      call void @llvm.trap()
      unreachable
    }

On the CPU side, the generated code seems suboptimal but is still correct. On the GPU side however, we can't deal with these literal pointers to CPU memory. That's why I have added a codegenparam allow_alloc, which is passed to static_eval, but apparently there's other codegen paths that can emit literal pointers... Is this a bug, i.e. should I only expect static_eval to perform compile-time allocations and emit references to them, or are there other places that need to be guarded by such a codegenparam?

Bisected to #20853
cc @JeffBezanson, @cfoket

Version info:

Julia Version 0.6.0-pre.alpha.206
Commit 90cfb8292c* (2017-03-21 03:12 UTC)
Platform Info:
  OS: Linux (x86_64-unknown-linux-gnu)
  CPU: Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, ivybridge)
@maleadt maleadt added kind:bug Indicates an unexpected problem or unintended behavior compiler:codegen Generation of LLVM IR and native code labels Mar 21, 2017
@JeffBezanson
Copy link
Sponsor Member

Aha, @Keno has also recently been having trouble with this (related to #21074). Apparently our codegen emits llvm literals when a constant is used to initialize a slot, but opaque pointers when it is used to initialize an ssavalue. #20853 did not cause this, but makes it more prevalent due to replacing slots with ssavalues in more cases.

We need to fix codegen not to use these opaque pointers so often. As part of this, we'll probably need to keep around (during codegen) both a literal and boxed representation of a value, in case the value needs to be boxed (to avoid allocating the box at runtime).

@yuyichao
Copy link
Contributor

Basically dup of #18387 ?

@Keno
Copy link
Member

Keno commented Mar 21, 2017

I have this commit locally:

commit a4937a7208a3bb2fb45a9d493be159eec6084b8f
Author: Keno Fischer <[email protected]>
Date:   Mon Mar 20 19:32:12 2017 +0000

    Special case code generation for const SSAvals (21074)

diff --git a/src/codegen.cpp b/src/codegen.cpp
index b167071..f5da443 100644
--- a/src/codegen.cpp
+++ b/src/codegen.cpp
@@ -3702,6 +3702,16 @@ static void emit_assignment(jl_value_t *l, jl_value_t *r, jl_codectx_t *ctx)
                 emit_unbox(vtype, slot, slot.typ, dest);
                 slot = mark_julia_slot(dest, slot.typ, NULL, tbaa_stack);
             }
+        } else if (slot.isboxed && slot.constant && slot.isimmutable &&
+          jl_array_store_unboxed(slot.typ)) {
+            // Stack allocate a copy of this constant so LLVM can do constant
+            // propagation.
+            bool isboxed;
+            Type *vtype = julia_type_to_llvm(slot.typ, &isboxed);
+            assert(!isboxed);
+            Value *dest = emit_static_alloca(vtype);
+            emit_unbox(vtype, slot, slot.typ, dest);
+            slot = mark_julia_slot(dest, slot.typ, NULL, tbaa_stack);
         }
         ctx->SAvalues.at(idx) = slot; // now SAvalues[idx] contains the SAvalue
         ctx->ssavalue_assigned.at(idx) = true;

@vtjnash
Copy link
Sponsor Member

vtjnash commented Mar 21, 2017

you shouldn't discard the constant bit so readily, it may inhibit downstream optimization

@vtjnash
Copy link
Sponsor Member

vtjnash commented Mar 21, 2017

We need to fix codegen not to use these opaque pointers so often

Yeah, basically we probably just want to have data_pointer emit a pointer to a GlobalVariable constant instead of the opaque pointer. It might waste a bit of memory, but since it also can often enable other LLVM optimizations (as noted here), it's probably the right move. This used to trigger bugs in the prepare_globals resolution algorithm. But since removing that (in preference for jl_merge_modules), I hadn't fully revisited this (other than adding the special case for Slot :/).

@JeffBezanson JeffBezanson added this to the 0.6.0 milestone Mar 23, 2017
vtjnash added a commit that referenced this issue Apr 4, 2017
this allows llvm to optimize constant data access
fix #21121
vtjnash added a commit that referenced this issue Apr 5, 2017
this allows llvm to optimize constant data access
fix #21121
vtjnash added a commit that referenced this issue Apr 5, 2017
this allows llvm to optimize constant data access
fix #21121
maleadt added a commit to JuliaGPU/CUDAnative.jl that referenced this issue Apr 7, 2017
maleadt added a commit to JuliaGPU/CUDAnative.jl that referenced this issue Apr 7, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:codegen Generation of LLVM IR and native code kind:bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants