reorder ml-matches to avoid catastrophic performance case #49664

vtjnash · 2023-05-06T04:41:56Z

This ordering of the algorithm abandons the elegant insertion in favor of using another copy of Tarjan's SCC code. This enables us to abort the algorithm in O(k*n) time, instead of always running full O(n*n) time, where k is min(lim,n).

For example, to sort 1338 methods:

Before:
julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 3, Base.get_world_counter());
  0.136609 seconds (22.74 k allocations: 1.104 MiB)
julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, -1, Base.get_world_counter());
  0.046280 seconds (9.95 k allocations: 497.453 KiB)
julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 30000, Base.get_world_counter());
  0.132588 seconds (22.73 k allocations: 1.103 MiB)
julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 30000, Base.get_world_counter());
  0.135912 seconds (22.73 k allocations: 1.103 MiB)

After:
julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 3, Base.get_world_counter());
  0.001040 seconds (1.47 k allocations: 88.375 KiB)
julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, -1, Base.get_world_counter());
  0.039167 seconds (8.24 k allocations: 423.984 KiB)
julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 30000, Base.get_world_counter());
  0.081354 seconds (8.26 k allocations: 424.734 KiB)
julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 30000, Base.get_world_counter());
  0.080849 seconds (8.26 k allocations: 424.734 KiB)

And makes inference faster in rare cases (this particular example came up because the expression below occurs appears in @test macroexpansion), both before loading loading more packages, such as OmniPackage, and afterwards, where the cost is almost unchanged afterwards, versus increasing about 50x.

julia> f() = x(args...; kwargs...); @time @code_typed optimize=false f();
  0.143523 seconds (23.25 k allocations: 1.128 MiB, 99.96% compilation time) # before
  0.001172 seconds (1.86 k allocations: 108.656 KiB, 97.71% compilation time) # after

src/gf.c

@time

This ordering of the algorithm abandons the elegant insertion in favor of using another copy of Tarjan's SCC code. This enables us to abort the algorithm in O(k*n) time, instead of always running full O(n*n) time, where k is `min(lim,n)`. For example, to sort 1338 methods: Before: julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 3, Base.get_world_counter()); 0.136609 seconds (22.74 k allocations: 1.104 MiB) julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, -1, Base.get_world_counter()); 0.046280 seconds (9.95 k allocations: 497.453 KiB) julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 30000, Base.get_world_counter()); 0.132588 seconds (22.73 k allocations: 1.103 MiB) julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 30000, Base.get_world_counter()); 0.135912 seconds (22.73 k allocations: 1.103 MiB) After: julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 3, Base.get_world_counter()); 0.001040 seconds (1.47 k allocations: 88.375 KiB) julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, -1, Base.get_world_counter()); 0.039167 seconds (8.24 k allocations: 423.984 KiB) julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 30000, Base.get_world_counter()); 0.081354 seconds (8.26 k allocations: 424.734 KiB) julia> @time Base._methods_by_ftype(Tuple{typeof(Core.kwcall), NamedTuple, Any, Vararg{Any}}, 30000, Base.get_world_counter()); 0.080849 seconds (8.26 k allocations: 424.734 KiB) And makes inference faster in rare cases (this particular example came up because the expression below occurs appears in `@test` macroexpansion), both before loading loading more packages, such as OmniPackage, and afterwards, where the cost is almost unchanged afterwards, versus increasing about 50x. julia> f() = x(args...; kwargs...); @time @code_typed optimize=false f(); 0.143523 seconds (23.25 k allocations: 1.128 MiB, 99.96% compilation time) # before 0.001172 seconds (1.86 k allocations: 108.656 KiB, 97.71% compilation time) # after

This now chooses the optimal SCC set based on the size of lim, which ensures we can assume this algorithm is now << O(n^2) in all reasonable cases, even though the algorithm we are using is O(n + e), where e may require up to n^2 work to compute in the worst case, but should require only about n*min(lim, log(n)) work in the expected average case. This also further pre-optimizes quick work (checking for existing coverage) and delays unnecessary work (computing for *ambig return).

vtjnash · 2023-05-12T19:06:41Z

@nanosoldier runbenchmarks(!"scalar" && !"union", vs=":master")

nanosoldier · 2023-05-13T00:20:50Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

vtjnash · 2023-05-13T02:28:26Z

As expected, this should be a big win for inference

vtjnash added the compiler:latency Compiler latency label May 6, 2023

vtjnash requested a review from JeffBezanson May 6, 2023 04:41

vtjnash mentioned this pull request May 8, 2023

Revert "precompile: fix race in precompiling exts (#3456)" JuliaLang/Pkg.jl#3462

Merged

vtjnash force-pushed the jn/ml-matches-rewritten branch 2 times, most recently from 35b3856 to 8f913e8 Compare May 10, 2023 16:30

vtjnash commented May 11, 2023

View reviewed changes

src/gf.c Outdated Show resolved Hide resolved

vtjnash force-pushed the jn/ml-matches-rewritten branch 2 times, most recently from 81c3fad to 2e90105 Compare May 12, 2023 15:18

vtjnash added 2 commits May 12, 2023 15:02

vtjnash force-pushed the jn/ml-matches-rewritten branch from 2e90105 to ac1cb1c Compare May 12, 2023 19:06

vtjnash merged commit fbbe9ed into master May 15, 2023

vtjnash deleted the jn/ml-matches-rewritten branch May 15, 2023 14:38

maleadt mentioned this pull request Jul 13, 2023

Assertion cycle == depth' failed when testing AxisKeys #50450

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reorder ml-matches to avoid catastrophic performance case #49664

reorder ml-matches to avoid catastrophic performance case #49664

vtjnash commented May 6, 2023 •

edited

Loading

vtjnash commented May 12, 2023

nanosoldier commented May 13, 2023

vtjnash commented May 13, 2023

reorder ml-matches to avoid catastrophic performance case #49664

reorder ml-matches to avoid catastrophic performance case #49664

Conversation

vtjnash commented May 6, 2023 • edited Loading

vtjnash commented May 12, 2023

nanosoldier commented May 13, 2023

vtjnash commented May 13, 2023

vtjnash commented May 6, 2023 •

edited

Loading