Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to speed up LateLowerGCFrame::ComputeLiveness #44463

Merged
merged 1 commit into from
Mar 8, 2022

Conversation

chriselrod
Copy link
Contributor

Co-authored-by: Jameson Nash[email protected]
Co-authored-by: Oscar Smith <oscardssmith.gmail.com>

Reduces allocations (malloc) in late lower gc frame.

Co-authored-by: Jameson Nash<[email protected]>
Co-authored-by: Oscar Smith <oscardssmith.gmail.com>
@oscardssmith
Copy link
Member

I can confirm this is a significant improvement. I'll see if I can come up with a simple example that exhibits this.

@oscardssmith oscardssmith added compiler:latency Compiler latency performance Must go faster labels Mar 5, 2022
@oscardssmith
Copy link
Member

I haven't been able to get a great mwe for this, but the following shows a 2x improvement (on the pass runtime, not the whole runtime).

n = 1500
f(var1, var2, var3) = (var1>1.1) ? (var2+var3) : (var2*var1)
vars = [Symbol("var$i") for i in 1:n]
s = :(struct x
    $([:($(vars[i])::Any) for i in 1:n]...)
    function x(;d...)
        var1 = get(d, :var1, rand(1:4))
        var2 = get(d, :var2, -rand())
        $(Expr[:(
            $(vars[i]) = get(d, $(QuoteNode(vars[i]))) do;
                f($(vars[i-1]), $(vars[i-2]), $(vars[rand(1:div(i,2))]))
        end) for i in 3:n]...)
        new($(vars...))
    end
end)
eval(s)
x(var3=4)

On master we get

oscardssmith:julia$ JULIA_LLVM_ARGS=--time-passes julia test.jl tee 2>&1 |head -n20
===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 27.8093 seconds (27.3734 wall clock)
   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  14.5054 ( 57.6%)   0.4789 ( 18.3%)  14.9843 ( 53.9%)  14.9414 ( 54.6%)  X86 DAG->DAG Instruction Selection
   1.2221 (  4.9%)   0.4556 ( 17.4%)   1.6777 (  6.0%)   1.6346 (  6.0%)  X86 Assembly Printer
   1.4528 (  5.8%)   0.0712 (  2.7%)   1.5240 (  5.5%)   1.5226 (  5.6%)  Late Lower GCFrame Pass

and on the PR, we get

oscardssmith:julia$ JULIA_LLVM_ARGS=--time-passes julia test.jl tee 2>&1 |head -n20
===-------------------------------------------------------------------------===
                      ... Pass execution timing report ...
===-------------------------------------------------------------------------===
  Total Execution Time: 14.5055 seconds (14.0426 wall clock)

   ---User Time---   --System Time--   --User+System--   ---Wall Time---  --- Name ---
  10.1672 ( 73.3%)   0.1061 ( 16.8%)  10.2733 ( 70.8%)  10.2115 ( 72.7%)  X86 DAG->DAG Instruction Selection
   2.2113 ( 15.9%)   0.0085 (  1.3%)   2.2198 ( 15.3%)   2.2188 ( 15.8%)  SLP Vectorizer
   0.3562 (  2.6%)   0.0467 (  7.4%)   0.4029 (  2.8%)   0.3893 (  2.8%)  Greedy Register Allocator
   0.2686 (  1.9%)   0.0202 (  3.2%)   0.2889 (  2.0%)   0.2839 (  2.0%)  Late Lower GCFrame Pass

There's definitely more to improve here, but this does appear to be a noticeable and welcome speedup.

@oscardssmith
Copy link
Member

Wait, I just looked at this again and it makes absolutely no sense. The late-lower GCFrame pass did get a bunch faster, but the total execution time also almost halved which makes absolutely no sense. @vtjnash any ideas?

@vtjnash vtjnash merged commit c4409c5 into JuliaLang:master Mar 8, 2022
@chriselrod chriselrod deleted the slightlyfastercomputeliveness branch March 8, 2022 01:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:latency Compiler latency performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants