more type-stable type-inference #41697

aviatesk · 2021-07-25T03:28:09Z

(this PR is the final output of my demo at our workshop)

This PR eliminated much of runtime dispatches within our type inference
routine, that are reported by the following JET analysis:

using JETTest

const CC = Core.Compiler

function function_filter(@nospecialize(ft))
    ft === typeof(CC.isprimitivetype) && return false
    ft === typeof(CC.ismutabletype) && return false
    ft === typeof(CC.isbitstype) && return false
    ft === typeof(CC.widenconst) && return false
    ft === typeof(CC.widenconditional) && return false
    ft === typeof(CC.widenwrappedconditional) && return false
    ft === typeof(CC.maybe_extract_const_bool) && return false
    ft === typeof(CC.ignorelimited) && return false
    return true
end

function frame_filter((; linfo) = sv)
    meth = linfo.def
    isa(meth, Method) || return true
    return occursin("compiler/", string(meth.file))
end

report_dispatch(CC.typeinf, (CC.NativeInterpreter, CC.InferenceState); function_filter, frame_filter)

on master

═════ 137 possible errors found ═════
...

on this PR

═════ 51 possible errors found ═════
...

And it seems like this PR makes JIT slightly faster:

on master

~/julia/julia master
❯ ./usr/bin/julia -e '@time using Plots; @time plot(rand(10,3));'
  3.659865 seconds (7.19 M allocations: 497.982 MiB, 3.94% gc time, 0.39% compilation time)
  2.696410 seconds (3.62 M allocations: 202.905 MiB, 7.49% gc time, 56.39% compilation time)

on this PR

~/julia/julia avi/jetdemo* 7s
❯ ./usr/bin/julia -e '@time using Plots; @time plot(rand(10,3));'
  3.396974 seconds (7.16 M allocations: 491.442 MiB, 4.80% gc time, 0.28% compilation time)
  2.591130 seconds (3.48 M allocations: 196.026 MiB, 7.29% gc time, 56.72% compilation time)

@time

(this PR is the final output of my demo at [our workshop](https://github.com/aviatesk/juliacon2021-workshop-pkgdev)) This PR eliminated much of runtime dispatches within our type inference routine, that are reported by the following JET analysis: ```julia using JETTest const CC = Core.Compiler function function_filter(@nospecialize(ft)) ft === typeof(CC.isprimitivetype) && return false ft === typeof(CC.ismutabletype) && return false ft === typeof(CC.isbitstype) && return false ft === typeof(CC.widenconst) && return false ft === typeof(CC.widenconditional) && return false ft === typeof(CC.widenwrappedconditional) && return false ft === typeof(CC.maybe_extract_const_bool) && return false ft === typeof(CC.ignorelimited) && return false return true end function frame_filter((; linfo) = sv) meth = linfo.def isa(meth, Method) || return true return occursin("compiler/", string(meth.file)) end report_dispatch(CC.typeinf, (CC.NativeInterpreter, CC.InferenceState); function_filter, frame_filter) ``` > on master ``` ═════ 137 possible errors found ═════ ... ``` > on this PR ``` ═════ 51 possible errors found ═════ ... ``` And it seems like this PR makes JIT slightly faster: > on master ```julia ~/julia/julia master ❯ ./usr/bin/julia -e '@time using Plots; @time plot(rand(10,3));' 3.659865 seconds (7.19 M allocations: 497.982 MiB, 3.94% gc time, 0.39% compilation time) 2.696410 seconds (3.62 M allocations: 202.905 MiB, 7.49% gc time, 56.39% compilation time) ``` > on this PR ```julia ~/julia/julia avi/jetdemo* 7s ❯ ./usr/bin/julia -e '@time using Plots; @time plot(rand(10,3));' 3.396974 seconds (7.16 M allocations: 491.442 MiB, 4.80% gc time, 0.28% compilation time) 2.591130 seconds (3.48 M allocations: 196.026 MiB, 7.29% gc time, 56.72% compilation time) ```

base/compiler/abstractinterpretation.jl

vtjnash · 2021-07-26T21:40:02Z

base/compiler/abstractinterpretation.jl

@@ -85,9 +85,12 @@ function abstract_call_gf_by_type(interp::AbstractInterpreter, @nospecialize(f),
 push!(edges, edge)
 end
 this_argtypes = isa(matches, MethodMatches) ? argtypes : matches.applicable_argtypes[i]
- const_rt, const_result = abstract_call_method_with_const_args(interp, result, f, this_argtypes, match, sv, false)


This change is less type stable, so it seems harder to optimize. Why is is better?

The main reason is our optimizer will emit a bit complex code when we're splatting union-type tuple:

julia> code_typed((Union{Tuple{Int,Nothing}, Tuple{Integer,String}},)) do tpl if getfield(tpl, 2) !== nothing a, b = tpl return a, b end return a, b end |> first CodeInfo( 1 ─ %1 = Main.getfield(tpl, 2)::Union{Nothing, String} │ %2 = (%1 === Main.nothing)::Bool │ %3 = Core.Intrinsics.not_int(%2)::Bool └── goto #9 if not %3 2 ─ %5 = (isa)(tpl, Tuple{Int64, Nothing})::Bool └── goto #4 if not %5 3 ─ %7 = π (tpl, Tuple{Int64, Nothing}) │ %8 = Base.getfield(%7, 1)::Union{Nothing, Int64} │ %9 = Base.add_int(1, 1)::Int64 │ %10 = Core.tuple(%8, %9)::Tuple{Union{Nothing, Int64}, Int64} └── goto #5 4 ─ %12 = Base.indexed_iterate(tpl, 1)::Tuple{Integer, Int64} └── goto #5 5 ┄ %14 = φ (#3 => %10, #4 => %12)::Tuple{Integer, Int64} │ %15 = Core.getfield(%14, 1)::Integer │ %16 = (isa)(tpl, Tuple{Int64, Nothing})::Bool └── goto #7 if not %16 6 ─ %18 = π (tpl, Tuple{Int64, Nothing}) │ %19 = Base.getfield(%18, 2)::Union{Nothing, Int64} │ %20 = Base.add_int(2, 1)::Int64 │ %21 = Core.tuple(%19, %20)::Tuple{Union{Nothing, Int64}, Int64} └── goto #8 7 ─ %23 = Base.indexed_iterate(tpl, 2, 2)::Union{Tuple{Nothing, Int64}, Tuple{String, Int64}} └── goto #8 8 ┄ %25 = φ (#6 => %21, #7 => %23)::Union{Tuple{Nothing, Int64}, Tuple{String, Int64}} │ %26 = Core.getfield(%25, 1)::Union{Nothing, String} │ %27 = Core.tuple(%15, %26)::Tuple{Integer, Union{Nothing, String}} └── return %27 9 ─ $(Expr(:throw_undef_if_not, :a, false))::Any └── unreachable ) => Tuple{Integer, Union{Nothing, String}} julia> code_typed((Union{Nothing, Tuple{Integer,String}},)) do tpl if tpl !== nothing a, b = tpl return a, b end return nothing end |> first CodeInfo( 1 ─ %1 = (tpl === Main.nothing)::Bool │ %2 = Core.Intrinsics.not_int(%1)::Bool └── goto #3 if not %2 2 ─ %4 = π (tpl, Tuple{Integer, String}) │ %5 = Base.getfield(%4, 1)::Integer │ %6 = π (tpl, Tuple{Integer, String}) │ %7 = Base.getfield(%6, 2)::String │ %8 = Core.tuple(%5, %7)::Tuple{Integer, String} └── return %8 3 ─ return Main.nothing ) => Union{Nothing, Tuple{Integer, String}}

So we can get a performance improvement if we branch on isnothing and avoid union-tuple:

julia> using BenchmarkTools julia> function f1(ts) local a, b for t in ts if getfield(t, 2) !== nothing a, b = t end end return a, b end f1 (generic function with 1 method) julia> function f2(ts) local a, b for t in ts if t !== nothing a, b = t end end return a, b end f2 (generic function with 1 method) julia> let ts = subtypes(Any) ts1 = Union{Tuple{DataType,Nothing},Tuple{Any,Char}}[] ts2 = Union{Nothing,Tuple{Any,Char}}[] for _ in 1:1000 if rand(Bool) push!(ts1, (Any, nothing)); push!(ts2, nothing) else t = rand(ts), rand(Char) push!(ts1, t); push!(ts2, t) end end display(@benchmark(f1($ts1))) ts2 = Union{Nothing,Tuple{Any,Char}}[ rand(Bool) ? nothing : (rand(ts), rand(Char)) for _ in 1:1000] display(@benchmark(f2($ts2))) end BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 123.927 μs … 4.081 ms ┊ GC (min … max): 0.00% … 95.08% Time (median): 130.366 μs ┊ GC (median): 0.00% Time (mean ± σ): 150.662 μs ± 107.727 μs ┊ GC (mean ± σ): 2.69% ± 3.84% ▇█▇▅▄▃▂▁▂▂▂▂▂▂▁▁ ▁▂▂▁▁▁ ▁ ▂ █████████████████████████████████▆█▇█▇▆▇▇▇▆▇▇▆▆▆▆▆▇▅▆▅▅▅▅▅▄▅▅ █ 124 μs Histogram: log(frequency) by time 308 μs < Memory estimate: 82.19 KiB, allocs estimate: 3347. BenchmarkTools.Trial: 10000 samples with 1 evaluation. Range (min … max): 17.817 μs … 145.613 μs ┊ GC (min … max): 0.00% … 0.00% Time (median): 18.958 μs ┊ GC (median): 0.00% Time (mean ± σ): 21.272 μs ± 8.474 μs ┊ GC (mean ± σ): 0.00% ± 0.00% ▆█▆▄▁▁ ▁ ███████▇▇▇▇▇▇▇▇▇▇▇▇▆▆▆▆▅▅▅▆▄▅▄▄▅▆▅▆▆▆▆▇▇▇▆▆▆▇▇▇▇▆▆▃▃▃▄▄▂▅▆▅▅ █ 17.8 μs Histogram: log(frequency) by time 57.6 μs < Memory estimate: 8.08 KiB, allocs estimate: 516.

Ah, that is not what I expected that code to infer to as I was expecting Tuple{Any,Union{Char,Nothing}}. That particular tuple we could then potentially add codegen optimizations for, so that we don't box it immediately.

Yes, I agree with that special handling of union-tuple in the optimizer would be worth to do.
For now I want to have this change (currently the return type of abstract_call_method_with_const_args is Union{Tuple{DataType, Nothing}, Tuple{Any, InferenceResult}}.

ah, I see that. I did not expect that inference could do so accurately:

julia> Base.return_types(Core.Compiler.abstract_call_method_with_const_args) 1-element Vector{Any}: Union{Tuple{Any, Core.Compiler.InferenceResult}, Tuple{DataType, Nothing}}

Yes, do you think we can move this PR forward ?

@time

(this PR is the final output of my demo at [our workshop](https://github.com/aviatesk/juliacon2021-workshop-pkgdev)) This PR eliminated much of runtime dispatches within our type inference routine, that are reported by the following JET analysis: ```julia using JETTest const CC = Core.Compiler function function_filter(@nospecialize(ft)) ft === typeof(CC.isprimitivetype) && return false ft === typeof(CC.ismutabletype) && return false ft === typeof(CC.isbitstype) && return false ft === typeof(CC.widenconst) && return false ft === typeof(CC.widenconditional) && return false ft === typeof(CC.widenwrappedconditional) && return false ft === typeof(CC.maybe_extract_const_bool) && return false ft === typeof(CC.ignorelimited) && return false return true end function frame_filter((; linfo) = sv) meth = linfo.def isa(meth, Method) || return true return occursin("compiler/", string(meth.file)) end report_dispatch(CC.typeinf, (CC.NativeInterpreter, CC.InferenceState); function_filter, frame_filter) ``` > on master ``` ═════ 137 possible errors found ═════ ... ``` > on this PR ``` ═════ 51 possible errors found ═════ ... ``` And it seems like this PR makes JIT slightly faster: > on master ```julia ~/julia/julia master ❯ ./usr/bin/julia -e '@time using Plots; @time plot(rand(10,3));' 3.659865 seconds (7.19 M allocations: 497.982 MiB, 3.94% gc time, 0.39% compilation time) 2.696410 seconds (3.62 M allocations: 202.905 MiB, 7.49% gc time, 56.39% compilation time) ``` > on this PR ```julia ~/julia/julia avi/jetdemo* 7s ❯ ./usr/bin/julia -e '@time using Plots; @time plot(rand(10,3));' 3.396974 seconds (7.16 M allocations: 491.442 MiB, 4.80% gc time, 0.28% compilation time) 2.591130 seconds (3.48 M allocations: 196.026 MiB, 7.29% gc time, 56.72% compilation time) ``` cherry-picked from 795935f

vtjnash reviewed Jul 26, 2021

View reviewed changes

base/compiler/abstractinterpretation.jl Show resolved Hide resolved

vtjnash reviewed Jul 26, 2021

View reviewed changes

vtjnash merged commit 795935f into master Jul 29, 2021

vtjnash deleted the avi/jetdemo branch July 29, 2021 19:04

aviatesk added a commit to aviatesk/JET.jl that referenced this pull request Jul 30, 2021

update to JuliaLang/julia#41697

485f1e7

vtjnash mentioned this pull request Sep 9, 2021

release-1.7: Backports for 1.7-RC1 #41781

Merged

63 tasks

aviatesk mentioned this pull request Sep 9, 2021

release: backport complicated compiler changes #42180

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

more type-stable type-inference #41697

more type-stable type-inference #41697

aviatesk commented Jul 25, 2021

vtjnash Jul 26, 2021

aviatesk Jul 29, 2021

vtjnash Jul 29, 2021

aviatesk Jul 29, 2021 •

edited

Loading

vtjnash Jul 29, 2021

aviatesk Jul 29, 2021

more type-stable type-inference #41697

more type-stable type-inference #41697

Conversation

aviatesk commented Jul 25, 2021

vtjnash Jul 26, 2021

Choose a reason for hiding this comment

aviatesk Jul 29, 2021

Choose a reason for hiding this comment

vtjnash Jul 29, 2021

Choose a reason for hiding this comment

aviatesk Jul 29, 2021 • edited Loading

Choose a reason for hiding this comment

vtjnash Jul 29, 2021

Choose a reason for hiding this comment

aviatesk Jul 29, 2021

Choose a reason for hiding this comment

aviatesk Jul 29, 2021 •

edited

Loading