improve optimization passes to produce more compact IR #20853

JeffBezanson · 2017-03-01T22:55:24Z

I've noticed a few functions with pretty bloated IR (redundant variables, useless statements, etc.). This improves the optimization passes to address some of these cases. An extreme example is vector+vector broadcast, which with these changes goes from 248 statements to 127.

vtjnash · 2017-03-01T23:40:11Z

base/inference.jl

@@ -305,7 +305,8 @@ const workq = Vector{InferenceState}() # set of InferenceState objects that can

 #### helper functions ####

-@inline slot_id(s) = isa(s, SlotNumber) ? (s::SlotNumber).id : (s::TypedSlot).id # using a function to ensure we can infer this
+@inline slot_id(s::Slot) =


type-inference should be good enough now that this won't cause a regression, but I'm not sure it's ideal to require it

Why? How could adding this declaration cause a regression?

it blocks inlining turning the isa tests into a dynamic dispatch, unless we've correctly inferred that the value <: Slot

vtjnash · 2017-03-01T23:41:10Z

base/inference.jl

@@ -3359,7 +3360,8 @@ function is_pure_builtin(f::ANY)
 f === Intrinsics.checked_srem_int ||
 f === Intrinsics.checked_urem_int ||
 f === Intrinsics.check_top_bit ||
- f === Intrinsics.sqrt_llvm)
+ f === Intrinsics.sqrt_llvm ||
+ f === Intrinsics.cglobal) # cglobal throws an error for symbol-not-found


all of them can throw errors

True... it's just the BLAS code relies on a cglobal call in void context throwing an error, which is a very atypical use of an intrinsic.

I think we should just fix that code. I've never been super happy with it throwing an exception during every normal startup anyways.

vtjnash · 2017-03-01T23:43:51Z

base/inference.jl

+end
+
+function var_matches(a::Union{Slot,SSAValue}, b::Union{Slot,SSAValue})
+ return ((isa(a,SSAValue) && isa(b,SSAValue)) || (isa(a,Slot) && isa(b,Slot))) && a.id == b.id


might be more compact and readable just to write this using dispatch

vtjnash · 2017-03-01T23:45:29Z

base/inference.jl

@@ -3657,7 +3657,7 @@ function inlineable(f::ANY, ft::ANY, e::Expr, atypes::Vector{Any}, sv::Inference
 if sv.params.inlining
 if isa(e.typ, Const) # || isconstType(e.typ)
 if (f === apply_type || f === fieldtype || f === typeof ||
- istopfunction(topmod, f, :typejoin) ||
+ istopfunction(topmod, f, :typejoin) || f === (===) ||


should have i's own line to be consistent with the other items here

note that === should often show up here as a Conditional object, not a Const

JeffBezanson · 2017-03-02T00:00:32Z

@nanosoldier runbenchmarks(ALL, vs = ":master")

nanosoldier · 2017-03-02T03:22:06Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

JeffBezanson · 2017-03-02T05:19:59Z

A very frustrating inlining-related regression here: the code for a convert method got smaller, allowing it to be inlined into promote, which prevented the promote method from getting inlined, which introduced an extra tuple allocation. Will have to do something about that.

andyferris · 2017-03-02T12:26:29Z

I find @inline is a bit of a cancer... even 1-liners need an @inline because what they call might want to be inlined.

It seems to me that for one-liners (or two-liners, etc) this should be relatively safe - only code which would have been partially inlined anyway will get inlined just one more level (per @inline method).

Thus, I think we could relatively safely add @inline to a few more places in Base (even/especially where they seem unnecessary at first glance), including for promote. Interestingly, in StaticArrays I provide method specializations which are verbatim copies of Base methods but with an @inline decoration, which makes measurable performance improvements.

(Maybe small functions need a @propagate_inline meta (just joking! :P))

vtjnash · 2017-03-05T21:23:35Z

Will have to do something about that

It seems like maybe the inlining threshold should be (partially) based on the pre-inlined method body, or some ratio? But this is probably a discussion for a different place.

JeffBezanson · 2017-03-08T06:51:29Z

@nanosoldier runbenchmarks(ALL, vs = ":master")

nanosoldier · 2017-03-08T10:21:32Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

this was affected by using `Const` in more cases instead of `Type{}`

don't inline into a function `f` if doing so would put it over the inlining threshhold, and if inlining `f` itself would help avoid tuple allocations. so far this is only used on `promote`, to limit the effects as much as possible.

JeffBezanson · 2017-03-08T17:36:16Z

@nanosoldier runbenchmarks(ALL, vs = ":master")

JeffBezanson · 2017-03-08T17:40:03Z

Ok, what I did here was adjust the inlining pass to accumulate added statements into a single buffer, which I think makes the code a bit simpler, and allows us to easily observe how big the enclosing function is getting. Then I use this to avoid inlining into promote if it would make promote itself non-inlineable. This is helpful for bigints and bigfloats. There were some regressions when I applied this heuristic more widely; testing again now that it is applied only to promote.

nanosoldier · 2017-03-08T20:52:10Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

tkelman · 2017-03-08T22:13:22Z

interesting, factorizations got worse but almost everything else got better

KristofferC · 2017-03-08T22:42:09Z

@nanosoldier runbenchmarks(ALL, vs = ":master")

Double check.

nanosoldier · 2017-03-09T02:00:49Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

tkelman · 2017-03-09T02:04:52Z

svd and ["sparse","index",("spmat","logical",100)] slowdowns look real then

JeffBezanson · 2017-03-10T01:03:26Z

@nanosoldier runbenchmarks(ALL, vs = ":master")

JeffBezanson · 2017-03-10T01:08:49Z

I think I figured this one out. Replacing certain slots with equivalent ssavalues was causing LLVM to emit excessive numbers of memcpys to move tuples around. This can probably be considered a quasi-bug in LLVM (SROA pass I believe), since it should have been able to figure out that a tuple should be stack allocated to begin with and left in place. Here's a sample of the IR:

julia> G=Base.Generator{Base.Iterators.Prod2{UnitRange{Int64},UnitRange{Int64}},getfield(Base,Symbol("##54#55")){Float64,NTuple{5,Array{Float64,1}}}}

julia> code_llvm(Base.collect_to!, (Matrix{Float64}, G, Int, Tuple{Int,Int,Nullable{Int},Bool}))

define i8** @"julia_collect_to!_68175"(i8**, i8**, i64, { i64, i64, %Nullable.64, i8 }*) #0 !dbg !5 {
top:
  %.sroa.281 = alloca [7 x i8], align 1
  %.sroa.979 = alloca [7 x i8], align 1
  %4 = alloca { [2 x i64], { i64, i64, %Nullable.64, i8 } }, align 8
  %.sroa.264.sroa.0 = alloca [7 x i8], align 1
  %.sroa.5 = alloca [7 x i8], align 1
  %"#temp#6.sroa.4.sroa.4.sroa.3.sroa.0" = alloca [7 x i8], align 1
  %"#temp#6.sroa.4.sroa.6" = alloca [7 x i8], align 1
  %st.sroa.7.0..sroa_idx = getelementptr inbounds { i64, i64, %Nullable.64, i8 }, { i64, i64, %Nullable.64, i8 }* %3, i64 0, i32 3
  %st.sroa.7.0.copyload = load i8, i8* %st.sroa.7.0..sroa_idx, align 1
  %5 = and i8 %st.sroa.7.0.copyload, 1
  %6 = icmp eq i8 %5, 0
  br i1 %6, label %if.lr.ph, label %L72

if.lr.ph:                                         ; preds = %top
  %7 = bitcast i8** %0 to double**
  %8 = load double*, double** %7, align 8
  %st.sroa.6.0..sroa_idx26 = getelementptr inbounds { i64, i64, %Nullable.64, i8 }, { i64, i64, %Nullable.64, i8 }* %3, i64 0, i32 2, i32 1
  %st.sroa.6.0.copyload = load i64, i64* %st.sroa.6.0..sroa_idx26, align 1
  %st.sroa.4.0..sroa_idx = getelementptr inbounds { i64, i64, %Nullable.64, i8 }, { i64, i64, %Nullable.64, i8 }* %3, i64 0, i32 2, i32 0
  %st.sroa.4.0.copyload = load i8, i8* %st.sroa.4.0..sroa_idx, align 1
  %st.sroa.3.0..sroa_idx18 = getelementptr inbounds { i64, i64, %Nullable.64, i8 }, { i64, i64, %Nullable.64, i8 }* %3, i64 0, i32 1
  %st.sroa.3.0.copyload = load i64, i64* %st.sroa.3.0..sroa_idx18, align 1
  %st.sroa.0.0..sroa_idx = getelementptr inbounds { i64, i64, %Nullable.64, i8 }, { i64, i64, %Nullable.64, i8 }* %3, i64 0, i32 0
  %st.sroa.0.0.copyload = load i64, i64* %st.sroa.0.0..sroa_idx, align 1
  %9 = getelementptr i8*, i8** %1, i64 1
  %10 = getelementptr i8*, i8** %1, i64 2
  %11 = bitcast i8** %10 to i64*
  %12 = bitcast i8** %9 to i64*
  %.sroa.264.sroa.0.0..sroa_idx = getelementptr inbounds [7 x i8], [7 x i8]* %.sroa.264.sroa.0, i64 0, i64 0
  %13 = getelementptr i8*, i8** %1, i64 4
  %14 = bitcast i8** %13 to i64*
  %.sroa.3.sroa.5.33..sroa.5.0..sroa_idx.sroa_idx = getelementptr inbounds [7 x i8], [7 x i8]* %.sroa.5, i64 0, i64 0
  %".sroa.3.sroa.3.sroa.2.sroa.0.0.#temp#6.sroa.4.sroa.4.sroa.3.sroa.0.0..sroa_idx85.sroa_idx" = getelementptr inbounds [7 x i8], [7 x i8]* %"#temp#6.sroa.4.sroa.4.sroa.3.sroa.0", i64 0, i64 0
  %"#temp#6.sroa.4.sroa.6.33..sroa_idx" = getelementptr inbounds [7 x i8], [7 x i8]* %"#temp#6.sroa.4.sroa.6", i64 0, i64 0
  %"#temp#6.sroa.0.0..sroa_idx" = getelementptr inbounds { [2 x i64], { i64, i64, %Nullable.64, i8 } }, { [2 x i64], { i64, i64, %Nullable.64, i8 } }* %4, i64 0, i32 0, i64 0
  %"#temp#6.sroa.3.0..sroa_idx43" = getelementptr inbounds { [2 x i64], { i64, i64, %Nullable.64, i8 } }, { [2 x i64], { i64, i64, %Nullable.64, i8 } }* %4, i64 0, i32 0, i64 1
  %"#temp#6.sroa.4.sroa.0.0.#temp#6.sroa.4.0..sroa_cast.sroa_idx" = getelementptr inbounds { [2 x i64], { i64, i64, %Nullable.64, i8 } }, { [2 x i64], { i64, i64, %Nullable.64, i8 } }* %4, i64 0, i32 1, i32 0
  %"#temp#6.sroa.4.sroa.3.0.#temp#6.sroa.4.0..sroa_cast.sroa_idx57" = getelementptr inbounds { [2 x i64], { i64, i64, %Nullable.64, i8 } }, { [2 x i64], { i64, i64, %Nullable.64, i8 } }* %4, i64 0, i32 1, i32 1
  %"#temp#6.sroa.4.sroa.4.sroa.0.0.#temp#6.sroa.4.sroa.4.0.#temp#6.sroa.4.0..sroa_cast.sroa_idx.sroa_idx" = getelementptr inbounds { [2 x i64], { i64, i64, %Nullable.64, i8 } }, { [2 x i64], { i64, i64, %Nullable.64, i8 } }* %4, i64 0, i32 1, i32 2, i32 0
  %"#temp#6.sroa.4.sroa.4.sroa.3.sroa.0.0.#temp#6.sroa.4.sroa.4.sroa.3.0.#temp#6.sroa.4.sroa.4.0.#temp#6.sroa.4.0..sroa_cast.sroa_idx.sroa_raw_idx.sroa_raw_cast" = bitcast { [2 x i64], { i64, i64, %Nullable.64, i8 } }* %4 to i8*
  %"#temp#6.sroa.4.sroa.4.sroa.3.sroa.0.0.#temp#6.sroa.4.sroa.4.sroa.3.0.#temp#6.sroa.4.sroa.4.0.#temp#6.sroa.4.0..sroa_cast.sroa_idx.sroa_raw_idx.sroa_raw_idx" = getelementptr inbounds i8, i8* %"#temp#6.sroa.4.sroa.4.sroa.3.sroa.0.0.#temp#6.sroa.4.sroa.4.sroa.3.0.#temp#6.sroa.4.sroa.4.0.#temp#6.sroa.4.0..sroa_cast.sroa_idx.sroa_raw_idx.sroa_raw_cast", i64 33
  %"#temp#6.sroa.4.sroa.4.sroa.3.sroa.3.0.#temp#6.sroa.4.sroa.4.sroa.3.0.#temp#6.sroa.4.sroa.4.0.#temp#6.sroa.4.0..sroa_cast.sroa_idx.sroa_raw_idx.sroa_idx87" = getelementptr inbounds { [2 x i64], { i64, i64, %Nullable.64, i8 } }, { [2 x i64], { i64, i64, %Nullable.64, i8 } }* %4, i64 0, i32 1, i32 2, i32 1
  %"#temp#6.sroa.4.sroa.5.0.#temp#6.sroa.4.0..sroa_cast.sroa_idx" = getelementptr inbounds { [2 x i64], { i64, i64, %Nullable.64, i8 } }, { [2 x i64], { i64, i64, %Nullable.64, i8 } }* %4, i64 0, i32 1, i32 3
  %"#temp#6.sroa.4.sroa.6.0.#temp#6.sroa.4.0..sroa_cast.sroa_raw_idx" = getelementptr inbounds i8, i8* %"#temp#6.sroa.4.sroa.4.sroa.3.sroa.0.0.#temp#6.sroa.4.sroa.4.sroa.3.0.#temp#6.sroa.4.sroa.4.0.#temp#6.sroa.4.0..sroa_cast.sroa_idx.sroa_raw_idx.sroa_raw_cast", i64 49
  %15 = bitcast i8** %1 to i8***
  %16 = getelementptr inbounds { [2 x i64], { i64, i64, %Nullable.64, i8 } }, { [2 x i64], { i64, i64, %Nullable.64, i8 } }* %4, i64 0, i32 0
  %.sroa.677.sroa.0.0..sroa.281.1..sroa_idx.sroa_idx = getelementptr inbounds [7 x i8], [7 x i8]* %.sroa.281, i64 0, i64 0
  %.sroa.979.33..sroa_idx = getelementptr inbounds [7 x i8], [7 x i8]* %.sroa.979, i64 0, i64 0
  br label %if

if:                                               ; preds = %if.lr.ph, %L44
  %i.095 = phi i64 [ %2, %if.lr.ph ], [ %32, %L44 ]
  %st.sroa.0.094 = phi i64 [ %st.sroa.0.0.copyload, %if.lr.ph ], [ %st.sroa.0.0.copyload17, %L44 ]
  %st.sroa.3.093 = phi i64 [ %st.sroa.3.0.copyload, %if.lr.ph ], [ %st.sroa.3.0.copyload20, %L44 ]
  %st.sroa.4.092 = phi i8 [ %st.sroa.4.0.copyload, %if.lr.ph ], [ %st.sroa.4.0.copyload22, %L44 ]
  %st.sroa.6.091 = phi i64 [ %st.sroa.6.0.copyload, %if.lr.ph ], [ %st.sroa.6.0.copyload28, %L44 ]
  %17 = and i8 %st.sroa.4.092, 1
  %18 = icmp eq i8 %17, 0
  br i1 %18, label %if7, label %L35

L72.loopexit:                                     ; preds = %L44
  br label %L72

L72:                                              ; preds = %L72.loopexit, %top
  ret i8** %0

if7:                                              ; preds = %if
  %19 = add i64 %st.sroa.3.093, 1
  br label %L35

L35:                                              ; preds = %if, %if7
  %s24.0 = phi i64 [ %19, %if7 ], [ %st.sroa.3.093, %if ]
  %v2.0 = phi i64 [ %st.sroa.3.093, %if7 ], [ %st.sroa.6.091, %if ]
  %20 = load i64, i64* %11, align 8
  %21 = icmp eq i64 %st.sroa.0.094, %20
  br i1 %21, label %if8, label %L41

if8:                                              ; preds = %L35
  %22 = load i64, i64* %12, align 1
  %"#temp#6.sroa.4.sroa.4.sroa.3.sroa.097" = getelementptr inbounds [7 x i8], [7 x i8]* %"#temp#6.sroa.4.sroa.4.sroa.3.sroa.0", i64 0, i64 0
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %"#temp#6.sroa.4.sroa.4.sroa.3.sroa.097", i8* %.sroa.264.sroa.0.0..sroa_idx, i64 7, i32 1, i1 false)
  %23 = load i64, i64* %14, align 8
  %24 = add i64 %23, 1
  %25 = icmp eq i64 %s24.0, %24
  %26 = zext i1 %25 to i8
  %"#temp#6.sroa.4.sroa.698" = getelementptr inbounds [7 x i8], [7 x i8]* %"#temp#6.sroa.4.sroa.6", i64 0, i64 0
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %"#temp#6.sroa.4.sroa.698", i8* %.sroa.3.sroa.5.33..sroa.5.0..sroa_idx.sroa_idx, i64 7, i32 1, i1 false)
  br label %L44

L41:                                              ; preds = %L35
  %27 = add i64 %st.sroa.0.094, 1
  %"#temp#6.sroa.4.sroa.4.sroa.3.sroa.0100" = getelementptr inbounds [7 x i8], [7 x i8]* %"#temp#6.sroa.4.sroa.4.sroa.3.sroa.0", i64 0, i64 0
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %"#temp#6.sroa.4.sroa.4.sroa.3.sroa.0100", i8* %.sroa.677.sroa.0.0..sroa.281.1..sroa_idx.sroa_idx, i64 7, i32 1, i1 false)
  %"#temp#6.sroa.4.sroa.6101" = getelementptr inbounds [7 x i8], [7 x i8]* %"#temp#6.sroa.4.sroa.6", i64 0, i64 0
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %"#temp#6.sroa.4.sroa.6101", i8* %.sroa.979.33..sroa_idx, i64 7, i32 1, i1 false)
  br label %L44

L44:                                              ; preds = %L41, %if8
  %"#temp#6.sroa.4.sroa.0.0" = phi i64 [ %22, %if8 ], [ %27, %L41 ]
  %"#temp#6.sroa.4.sroa.5.0" = phi i8 [ %26, %if8 ], [ 0, %L41 ]
  %"#temp#6.sroa.4.sroa.4.sroa.0.0" = phi i8 [ 0, %if8 ], [ 1, %L41 ]
  store i64 %st.sroa.0.094, i64* %"#temp#6.sroa.0.0..sroa_idx", align 8
  store i64 %v2.0, i64* %"#temp#6.sroa.3.0..sroa_idx43", align 8
  store i64 %"#temp#6.sroa.4.sroa.0.0", i64* %"#temp#6.sroa.4.sroa.0.0.#temp#6.sroa.4.0..sroa_cast.sroa_idx", align 8
  store i64 %s24.0, i64* %"#temp#6.sroa.4.sroa.3.0.#temp#6.sroa.4.0..sroa_cast.sroa_idx57", align 8
  store i8 %"#temp#6.sroa.4.sroa.4.sroa.0.0", i8* %"#temp#6.sroa.4.sroa.4.sroa.0.0.#temp#6.sroa.4.sroa.4.0.#temp#6.sroa.4.0..sroa_cast.sroa_idx.sroa_idx", align 8
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %"#temp#6.sroa.4.sroa.4.sroa.3.sroa.0.0.#temp#6.sroa.4.sroa.4.sroa.3.0.#temp#6.sroa.4.sroa.4.0.#temp#6.sroa.4.0..sroa_cast.sroa_idx.sroa_raw_idx.sroa_raw_idx", i8* %".sroa.3.sroa.3.sroa.2.sroa.0.0.#temp#6.sroa.4.sroa.4.sroa.3.sroa.0.0..sroa_idx85.sroa_idx", i64 7, i32 1, i1 false)
  store i64 %v2.0, i64* %"#temp#6.sroa.4.sroa.4.sroa.3.sroa.3.0.#temp#6.sroa.4.sroa.4.sroa.3.0.#temp#6.sroa.4.sroa.4.0.#temp#6.sroa.4.0..sroa_cast.sroa_idx.sroa_raw_idx.sroa_idx87", align 8
  store i8 %"#temp#6.sroa.4.sroa.5.0", i8* %"#temp#6.sroa.4.sroa.5.0.#temp#6.sroa.4.0..sroa_cast.sroa_idx", align 8
  call void @llvm.memcpy.p0i8.p0i8.i64(i8* %"#temp#6.sroa.4.sroa.6.0.#temp#6.sroa.4.0..sroa_cast.sroa_raw_idx", i8* %"#temp#6.sroa.4.sroa.6.33..sroa_idx", i64 7, i32 1, i1 false)
  %28 = load i8**, i8*** %15, align 8
  %29 = call double @"julia_#54_68176"(i8** %28, [2 x i64]* %16)
  %st.sroa.0.0.copyload17 = load i64, i64* %"#temp#6.sroa.4.sroa.0.0.#temp#6.sroa.4.0..sroa_cast.sroa_idx", align 8
  %st.sroa.3.0.copyload20 = load i64, i64* %"#temp#6.sroa.4.sroa.3.0.#temp#6.sroa.4.0..sroa_cast.sroa_idx57", align 8
  %st.sroa.4.0.copyload22 = load i8, i8* %"#temp#6.sroa.4.sroa.4.sroa.0.0.#temp#6.sroa.4.sroa.4.0.#temp#6.sroa.4.0..sroa_cast.sroa_idx.sroa_idx", align 8
  %st.sroa.6.0.copyload28 = load i64, i64* %"#temp#6.sroa.4.sroa.4.sroa.3.sroa.3.0.#temp#6.sroa.4.sroa.4.sroa.3.0.#temp#6.sroa.4.sroa.4.0.#temp#6.sroa.4.0..sroa_cast.sroa_idx.sroa_raw_idx.sroa_idx87", align 8
  %st.sroa.7.0.copyload30 = load i8, i8* %"#temp#6.sroa.4.sroa.5.0.#temp#6.sroa.4.0..sroa_cast.sroa_idx", align 8
  %30 = add i64 %i.095, -1
  %31 = getelementptr double, double* %8, i64 %30
  store double %29, double* %31, align 8
  %32 = add i64 %i.095, 1
  %33 = and i8 %st.sroa.7.0.copyload30, 1
  %34 = icmp eq i8 %33, 0
  br i1 %34, label %if, label %L72.loopexit
}

nanosoldier · 2017-03-10T04:46:06Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here. cc @jrevels

JeffBezanson · 2017-03-10T04:55:30Z

Much better, but still a couple regressions. I'll try tweaking the new rule here.

JeffBezanson · 2017-03-10T06:55:38Z

Ok, the first 2 regressions seem to be noise; I don't see any code differences. The sparse indexing regression is real, but a bit perverse: the simpler code this PR generates allows LLVM to vectorize over some of the integer vectors, which seems to cause a slight slowdown. A vectorization cost model issue I suppose. I'm not sure if we can or should do anything about it; that would amount to copying values from arrays into unnecessary temporary variables in hope of blocking vectorization, but we don't know when that would be profitable any better than LLVM does.

To reproduce the LLVM IR:

julia> m = sprand(100,100,0.1);

julia> j = find(rand(Bool,100));

@code_llvm Base.SparseArrays.getindex_I_sorted_linear(m, j, j)

Sacha0 · 2017-03-10T18:17:59Z

Sounds best to ignore the single, minor sparse indexing performance regression then? The widespread improvements this change provides are fantastic.

KristofferC · 2017-03-10T18:29:42Z

I strongly agree

StefanKarpinski · 2017-03-10T20:26:00Z

Seems like the best course of action is to merge and then open an issue about the regression.

tkelman · 2017-03-10T20:30:52Z

Probably report the IR example upstream, if vectorization is causing a slowdown.

Added in #22210 (and earlier begun in #20853), this is no longer necessary to avoid heap allocations, and thus serves little purpose now.

vtjnash reviewed Mar 1, 2017

View reviewed changes

ararslan added compiler:inference Type inference performance Must go faster labels Mar 1, 2017

yuyichao mentioned this pull request Mar 3, 2017

Slow loop fusion when multiplying a column vector with a row vector #20875

Closed

vtjnash mentioned this pull request Mar 5, 2017

break some dependency backedges #20792

Merged

JeffBezanson force-pushed the jb/IRcleanup branch from f121363 to c4dec1a Compare March 8, 2017 06:51

JeffBezanson added 5 commits March 8, 2017 11:49

improve ability to inline away typeassert and ===

3334f7f

this was affected by using `Const` in more cases instead of `Type{}`

remove effect-free statements in void_use_elim_pass!

e8c12ad

add a pass to remove variables whose values are never read

6d6218f

overhaul remove_redundant_temp_vars pass to remove more variables

c703c4d

add an inlining heuristic that helps avoid allocations

763f36c

don't inline into a function `f` if doing so would put it over the inlining threshhold, and if inlining `f` itself would help avoid tuple allocations. so far this is only used on `promote`, to limit the effects as much as possible.

JeffBezanson force-pushed the jb/IRcleanup branch from c4dec1a to 763f36c Compare March 8, 2017 17:34

KristofferC mentioned this pull request Mar 9, 2017

Benchmark 0.5 vs 0.6 [do not merge] #20947

Closed

fix regressions due to replacing too many slots with SSAValues

3e53392

JeffBezanson merged commit 0e970f0 into master Mar 10, 2017

tkelman deleted the jb/IRcleanup branch March 10, 2017 20:14

KristofferC mentioned this pull request Mar 14, 2017

Benchmark 0.5 vs 0.6 [do not merge] #20993

Closed

maleadt mentioned this pull request Mar 21, 2017

Boxed tuples leaking into generated code #21121

Closed

vtjnash added a commit that referenced this pull request Jul 9, 2020

inlining: remove bonus for Tuple return with heap references

fd6149e

Added in #22210 (and earlier begun in #20853), this is no longer necessary to avoid heap allocations, and thus serves little purpose now.

vtjnash added a commit that referenced this pull request Jul 13, 2020

inlining: remove bonus for Tuple return with heap references

3f78f13

Added in #22210 (and earlier begun in #20853), this is no longer necessary to avoid heap allocations, and thus serves little purpose now.

vtjnash added a commit that referenced this pull request Jul 20, 2020

inlining: remove bonus for Tuple return with heap references

54ebed6

Added in #22210 (and earlier begun in #20853), this is no longer necessary to avoid heap allocations, and thus serves little purpose now.

vtjnash added a commit that referenced this pull request Sep 1, 2020

inlining: remove bonus for Tuple return with heap references

02ea9c4

Added in #22210 (and earlier begun in #20853), this is no longer necessary to avoid heap allocations, and thus serves little purpose now.

vtjnash added a commit that referenced this pull request Sep 3, 2020

inlining: remove bonus for Tuple return with heap references

f1d511e

Added in #22210 (and earlier begun in #20853), this is no longer necessary to avoid heap allocations, and thus serves little purpose now.

c42f pushed a commit that referenced this pull request Sep 23, 2020

inlining: remove bonus for Tuple return with heap references

2ca0cc9

Added in #22210 (and earlier begun in #20853), this is no longer necessary to avoid heap allocations, and thus serves little purpose now.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improve optimization passes to produce more compact IR #20853

improve optimization passes to produce more compact IR #20853

JeffBezanson commented Mar 1, 2017

vtjnash Mar 1, 2017

JeffBezanson Mar 1, 2017

vtjnash Mar 1, 2017

vtjnash Mar 1, 2017

JeffBezanson Mar 1, 2017

vtjnash Mar 2, 2017

vtjnash Mar 1, 2017

vtjnash Mar 1, 2017 •

edited

Loading

vtjnash Mar 1, 2017

JeffBezanson commented Mar 2, 2017

nanosoldier commented Mar 2, 2017

JeffBezanson commented Mar 2, 2017

andyferris commented Mar 2, 2017 •

edited

Loading

vtjnash commented Mar 5, 2017

JeffBezanson commented Mar 8, 2017

nanosoldier commented Mar 8, 2017

JeffBezanson commented Mar 8, 2017

JeffBezanson commented Mar 8, 2017

nanosoldier commented Mar 8, 2017

tkelman commented Mar 8, 2017

KristofferC commented Mar 8, 2017

nanosoldier commented Mar 9, 2017

tkelman commented Mar 9, 2017 •

edited

Loading

JeffBezanson commented Mar 10, 2017

JeffBezanson commented Mar 10, 2017

nanosoldier commented Mar 10, 2017

JeffBezanson commented Mar 10, 2017

JeffBezanson commented Mar 10, 2017

Sacha0 commented Mar 10, 2017

KristofferC commented Mar 10, 2017

StefanKarpinski commented Mar 10, 2017

tkelman commented Mar 10, 2017

improve optimization passes to produce more compact IR #20853

improve optimization passes to produce more compact IR #20853

Conversation

JeffBezanson commented Mar 1, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vtjnash Mar 1, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

JeffBezanson commented Mar 2, 2017

nanosoldier commented Mar 2, 2017

JeffBezanson commented Mar 2, 2017

andyferris commented Mar 2, 2017 • edited Loading

vtjnash commented Mar 5, 2017

JeffBezanson commented Mar 8, 2017

nanosoldier commented Mar 8, 2017

JeffBezanson commented Mar 8, 2017

JeffBezanson commented Mar 8, 2017

nanosoldier commented Mar 8, 2017

tkelman commented Mar 8, 2017

KristofferC commented Mar 8, 2017

nanosoldier commented Mar 9, 2017

tkelman commented Mar 9, 2017 • edited Loading

JeffBezanson commented Mar 10, 2017

JeffBezanson commented Mar 10, 2017

nanosoldier commented Mar 10, 2017

JeffBezanson commented Mar 10, 2017

JeffBezanson commented Mar 10, 2017

Sacha0 commented Mar 10, 2017

KristofferC commented Mar 10, 2017

StefanKarpinski commented Mar 10, 2017

tkelman commented Mar 10, 2017

vtjnash Mar 1, 2017 •

edited

Loading

andyferris commented Mar 2, 2017 •

edited

Loading

tkelman commented Mar 9, 2017 •

edited

Loading