Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache external CodeInstances #43990

Merged
merged 7 commits into from
Feb 24, 2022
Merged

Cache external CodeInstances #43990

merged 7 commits into from
Feb 24, 2022

Conversation

timholy
Copy link
Sponsor Member

@timholy timholy commented Jan 30, 2022

Builds on #43793 and #43881. Tested each package with a task for which it was precompiled (I've devved most of these, see diffs that force precompilation here; ModelingToolkit is here). TTFT = Time To Finish Task

EDIT: latest benchmarks are in #43990 (comment)

Package Load (nightly) Load (PR) Load update (PR) TTFT (nightly) TTFT (PR)
CSV 2.6 2.0 15.8 12.9
DataFrames 1.3 1.5 12.5 10.4
Plots 4.1 6.7 4.6 8.9 9.5
GLMakie 7.3 15.4 11.4 60.4 42.1
OrdinaryDiffEq 6.8 8.8 1.4 1.3
ModelingToolkit 11.8 17.1 36.7 20.3
Flux 7.5 8.7 8.2 24.9 15.7
JuMP 4.8 5.0 2.5 1.6
ImageFiltering 2.5 3.0 1.2 1.2
LV (see below) 2.5 2.9 8.6 0.2

"Load update" includes the changes in #43990 (comment). (If not reported, they seemed unchanged from the previous measurement.)

Looks like a clear win. (It's not obvious the TTFT for Plots is a significant difference, I've never replicated that gap.)

Closes #42016
Fixes #35972 and possibly the other issues listed in JuliaGraphics/Colors.jl#496 (I haven't checked yet).

@timholy timholy added the compiler:latency Compiler latency label Jan 30, 2022
@simeonschaub
Copy link
Member

The regression for Plots.jl looks quite serious. Have you looked into that?

@baggepinnen
Copy link
Contributor

baggepinnen commented Jan 30, 2022

Is the header wrong? The current header does not make me think it's a clear win, almost the opposite. Time to finish task presumably includes loading the package as well, and all but one package takes longer to load, some by quite a lot as well. I can imagine that many Julia sessions include loading several packages that aren't used, and an increase in load time might then hurt more than the saving in TTFT.

@timholy
Copy link
Sponsor Member Author

timholy commented Jan 30, 2022

This hasshould have little or no functional consequence if you don't add precompilation to the package definition, and indeed you get to choose specifically what you precompile and what you don't. So, add whatever helps, and don't add what hurts. This shows you can shave 20-30% off the time for some packages, which is a lot for big packages like Flux & Makie. And just don't add precompilation to the ones where it does more harm than good.

That's a pretty clear win @baggepinnen, no?

@timholy
Copy link
Sponsor Member Author

timholy commented Jan 30, 2022

The regression for Plots.jl looks quite serious. Have you looked into that?

Not yet. It's fresh off the bus, there might indeed be some stuff that can be made better.

@timholy
Copy link
Sponsor Member Author

timholy commented Jan 31, 2022

Here's a really fun one: with a package

module LV

export filter2davx

using LoopVectorization

function filter2davx!(out::AbstractMatrix, A::AbstractMatrix, kern::AbstractMatrix)
    @turbo for J in CartesianIndices(out)
        tmp = zero(eltype(out))
        for I  CartesianIndices(kern)
            tmp += A[I + J] * kern[I]
        end
        out[J] = tmp
    end
    out
end

function filter2davx(A::AbstractMatrix, kern::AbstractMatrix)
    out = similar(A, size(A) .- size(kern) .+ 1)
    return filter2davx!(out, A, kern)
end

# precompilation
A = rand(Float64, 512, 512)
kern = [0.1 0.3 0.1;
        0.3 0.5 0.3;
        0.1 0.3 0.1];
filter2davx(A, kern)

end

and task (this is what the other tasks look like)

@time using LV
A = rand(Float64, 512, 512)
kern = [0.1 0.3 0.1;
        0.3 0.5 0.3;
        0.1 0.3 0.1];
tstart = time(); filter2davx(A, kern); println(time() - tstart)  # avoid @time to prevent compilation of the task

we get for nightly

julia> include("tasks_LV.jl")
  2.463157 seconds (6.24 M allocations: 408.848 MiB, 3.33% gc time, 18.92% compilation time)
8.572684049606323

and for this PR

julia> include("tasks_LV.jl")
  2.932197 seconds (6.09 M allocations: 408.457 MiB, 1.98% gc time, 18.04% compilation time)
0.23031187057495117

CC @chriselrod, @ChrisRackauckas.

@timholy timholy added this to the 1.8 milestone Jan 31, 2022
@timholy
Copy link
Sponsor Member Author

timholy commented Jan 31, 2022

I dug into the Plots regression a bit. For the loading, it can almost entirely be explained by the time to load GR_jll (2.7s on this PR vs 0.15s on nightly), which in turn depends on dozens of other JLL libraries. There are 169 freshly-compiled external MethodInstances when precompiling GR_jll, and as a consequence there are typically ~6000 backedges that need reinserting (when formerly there were 0).

My guess is we can prevent the compilation by doing a little more precompilation in, e.g., Base.BinaryPlatforms when we build Julia---if those methods are already compiled into Julia itself, they won't need to be compiled when precompiling the packages. Yet to be tested in practice.

@martinholters
Copy link
Member

martinholters commented Jan 31, 2022

Just want to add a datapoint where this PR does really well: If have a branch of ACME laying around where I've added some more precompiles and tweaked the code for better inferability in a few places. On master, that increases the load time from 1.6s to 1.9s and reduces the run-time of some model set-up code (that mostly involves compilation) from 25s to 17s. I was underwhelmed by that result and put this aside (although even that mediocre improvement would probably have been worth including in the main branch). Now with this PR, load times don't change much (a slight improvement, but within measurement noise), and the example task using ACME main is just improved to 23s. But then using the ACME branch with more precompiles, it goes down to 6.5s! And given that I had previously abandoned work, there are probably more precompiles that could be added for even further improvement. So 💯 to this PR from that perspective!

EDIT: Darn, I did @time f() for my measurement. Doing @time @eval f() instead, the improvement due to this PR is much smaller. But it's still there.

@giordano
Copy link
Contributor

For the loading, it can almost entirely be explained by the time to load GR_jll (2.7s on this PR vs 0.15s on nightly), which in turn depends on dozens of other JLL libraries

That's not good, I presume the loading time of GTK3_jll exploded as well.

@timholy
Copy link
Sponsor Member Author

timholy commented Jan 31, 2022

@giordano, I've finally got most of them stomped (locally), but the annoying bit is compilations like these:

...
structdiff(NamedTuple{(:validate_strict, :libc, :call_abi, :libgfortran_version, :cxxstring_abi, :libstdcxx_version, :os_version), Tuple{Bool, Nothing, Nothing, Nothing, String, Nothing, Nothing}}, Type{NamedTuple{(:validate_strict, :compare_strategies), T} where T<:Tuple}) from structdiff(NamedTuple{an, T} where T<:Tuple, Union{Type{NamedTuple{bn, T} where T<:Tuple}, NamedTuple{bn, T} where T<:Tuple}) where {an, bn}
  from module Base

pairs(NamedTuple{(:libc, :cxxstring_abi), Tuple{String, String}}) from pairs(NamedTuple{names, T} where T<:Tuple where names)
  from module Base.Iterators

_pairs_elt(Base.Pairs{Symbol, String, Tuple{Symbol, Symbol, Symbol}, NamedTuple{(:libc, :cxxstring_abi, :call_abi), Tuple{String, String, String}}}, Symbol) from _pairs_elt(Base.Pairs{K, V, I, A} where A where I, Any) where {K, V}
  from module Base.Iterators

Type##kw(NamedTuple{(:cxxstring_abi,), Tuple{String}}, Type{Base.BinaryPlatforms.Platform}, String, String) from Type##kw(Any, Type{Base.BinaryPlatforms.Platform}, String, String)
  from module Base.BinaryPlatforms
...

I've added a block at the end of binaryplatforms.jl containing:

Platform("x86_64", "linux"; validate_strict=true, libc="glibc")
for nt in (
    (libc="glibc", cxxstring_abi=""),
    (libc="glibc", cxxstring_abi="", call_abi=""),
    (                       libc=nothing, call_abi=nothing, libgfortran_version=nothing, cxxstring_abi=nothing, libstdcxx_version=nothing, os_version=nothing),
    (                       libc="glibc", call_abi=nothing, libgfortran_version=nothing, cxxstring_abi=nothing, libstdcxx_version=nothing, os_version=nothing),
    (                       libc="glibc", call_abi="",      libgfortran_version=nothing, cxxstring_abi=nothing, libstdcxx_version=nothing, os_version=nothing),
    (                       libc=nothing, call_abi=nothing, libgfortran_version=nothing, cxxstring_abi="cxx11", libstdcxx_version=nothing, os_version=nothing),
    (                       libc="glibc", call_abi="",      libgfortran_version=nothing, cxxstring_abi="cxx11", libstdcxx_version=nothing, os_version=nothing),
    (validate_strict=false, libc=nothing, call_abi=nothing, libgfortran_version=nothing, cxxstring_abi=nothing, libstdcxx_version=nothing, os_version=nothing),
    (validate_strict=false, libc="glibc", call_abi=nothing, libgfortran_version=nothing, cxxstring_abi=nothing, libstdcxx_version=nothing, os_version=nothing),
    (validate_strict=false, libc="glibc", call_abi="",      libgfortran_version=nothing, cxxstring_abi=nothing, libstdcxx_version=nothing, os_version=nothing),
    (validate_strict=false, libc=nothing, call_abi=nothing, libgfortran_version=nothing, cxxstring_abi="cxx11", libstdcxx_version=nothing, os_version=nothing),
    (validate_strict=false, libc="glibc", call_abi="",      libgfortran_version=nothing, cxxstring_abi="cxx11", libstdcxx_version=nothing, os_version=nothing),
)
    pairs(nt)
    iterate(pairs(nt))
    iterate(pairs(nt), haskey(nt, :libgfortran_version) ? (nt.libgfortran_version === nothing)+1 : 1)
    merge(nt, NamedTuple())
    merge(nt, Pair{Symbol,String}[])
    Base.structdiff(nt, (validate_strict=false, compare_strategies=Dict{String,Function}()))
    Platform("x86_64", "linux"; nt...)
end

but it's annoying and so far isn't complete. Can you think of a better approach?

@KristofferC
Copy link
Sponsor Member

KristofferC commented Jan 31, 2022

Is there anyway that can be prevented from compiling at all? It seems unlikely to be useful to compile?

timholy added a commit to JuliaPackaging/Preferences.jl that referenced this pull request Jan 31, 2022
Motivated by JuliaLang/julia#43990 (comment),
but probably not a bad idea in its own right.
timholy added a commit to JuliaPackaging/JLLWrappers.jl that referenced this pull request Jan 31, 2022
This is motivated by JuliaLang/julia#43990 (comment),
but doesn't seem likely to hurt anything even if that never merges.
@timholy
Copy link
Sponsor Member Author

timholy commented Jan 31, 2022

OK, with:

I've got the time to load GR_jll back down to about where it was before.

@timholy
Copy link
Sponsor Member Author

timholy commented Jan 31, 2022

Is there anyway that can be prevented from compiling at all?

I guess if we reordered the file order in Base.jl we could turn off compilation in BinaryPlatforms?

timholy added a commit to JuliaPackaging/JLLWrappers.jl that referenced this pull request Jan 31, 2022
This is motivated by JuliaLang/julia#43990 (comment),
but doesn't seem likely to hurt anything even if that never merges.
@timholy
Copy link
Sponsor Member Author

timholy commented Jan 31, 2022

That seems to have helped the gap in load times substantially (updated numbers posted above), although still not zero. I'm not sure we should expect it to be: if the packages are using precompilation, you're storing more stuff in the file, so loading might be expected to take longer. It's not really so simple since the load time is actually dominated by backedge insertion and other method-table perusals, but certainly it wouldn't be surprising for a load-time regression for some packages.

@timholy
Copy link
Sponsor Member Author

timholy commented Feb 1, 2022

Ok, except for a backwards-compatible shim that nothing calls, I just got rid of kwargs in the constructor for Platform. I've left the various NamedTuple specializations commented out because they were a ton of work to discover, and I'd hate to lose them to an accidental rebase. I'll strip garbage before merging.

With this and JuliaPackaging/Preferences.jl#28 we're basically tied with master wrt the time to load GR_jll.

@timholy
Copy link
Sponsor Member Author

timholy commented Feb 1, 2022

With the latest changes,

Package Load (nightly) Load (PR) TTFT (nightly) TTFT (PR)
Plots 3.0 3.5 10.9 10.8

so we're definitely at parity. The time to actually plot seems to vary a lot, and of course some packages might have been updated since I first started running this, but this comparison here uses the same Manifest file and were run moments apart from one another so the comparison between them is valid.

@timholy
Copy link
Sponsor Member Author

timholy commented Feb 1, 2022

And now with MakieOrg/Makie.jl#1634 we get

Package Load (nightly) Load (PR) TTFT (nightly) TTFT (PR)
GLMakie 6.6 9.9 52.0 33.9

With those changes, the majority of Makie's inference time has been wiped out (~27s is not inference).

@timholy
Copy link
Sponsor Member Author

timholy commented Feb 1, 2022

I noticed one problem & fixed it (see the commit just above this). Assuming CI passes, I think this is ready to go? But in contrast with the two predecessor PRs, I don't feel comfortable merging this without a review from someone who knows more than me.

@vchuravy vchuravy requested a review from vtjnash February 2, 2022 02:42
src/codegen.cpp Outdated Show resolved Hide resolved
@Krastanov
Copy link

This pull request reduced time-to-first-X for QuantumClifford from 11 seconds to 0.5 seconds. This is amazing! Thank you!

Hopefully this is the correct venue to share this and I am not causing noise.

@timholy
Copy link
Sponsor Member Author

timholy commented Feb 25, 2022

A key point is that for me as a developer, the metric to optimize is not TTFT but (precompile time)+TTFT (ie the cost of restarting a julia session). Working to precompile things in DFTK looks like hard and confusing work that will make TTFT go down but precompile time go up

Understood, but do keep in mind that typically many more people use a package than develop it, so there's a bit of a tradeoff between developer convenience and user convenience.

You may be able to have both, though: it should be possible to set up a compile-time Preference that will control whether you precompile locally with generated code. E.g., you might be able to set a local environment variable JULIA_PRECOMPILE_DFTK=0 and check for that during DFTK build.

That said, there are some things Julia could do better here. We recompile all dependents whenever a package changes, and perhaps we could be more selective (kind of like linking a bunch of .o files in a C compiler). But that's actually a pretty hard problem, so don't expect it to be solved anytime soon.

@baggepinnen
Copy link
Contributor

I also recently thought about the issue about cost of precompilation for developers. Couldn't you choose whether or not to run precompilation code based on if the package is being checked out for development? Something like the following pseudo code

if !is_developed(@__MODULE__)
    include("precompile.jl")
end

My initial experiments with this appeared to work well. My implementation of is_developed simply checked the path to the package.

const __CONTROLSYSTEMS_SOURCE_DIR__ = dirname(Base.source_path())
is_developed() = !(occursin(joinpath(".julia", "dev"), __CONTROLSYSTEMS_SOURCE_DIR__))

@timholy
Copy link
Sponsor Member Author

timholy commented Feb 25, 2022

That's a pretty clever approach. I'd want to shield that being a flag so that I could easily check the consequences of precompilation even if I have it devved, but there's a lot to like about your suggestion.

@DilumAluthge DilumAluthge removed the status:merge me PR is reviewed. Merge when all tests are passing label Feb 25, 2022
staticfloat pushed a commit to JuliaCI/julia-buildkite-testing that referenced this pull request Mar 2, 2022
Prior to this PR, Julia's precompiled `*.ji` files saved just two
categories of code: unspecialized method definitions and
type-specialized code for the methods defined by the package.  Any
novel specializations of methods from Base or previously-loaded
packages were not saved, and therefore effectively thrown away.

This PR caches all the code---internal or external---called during
package definition that hadn't been previously inferred, as long
as there is a backedge linking it back to a method owned by
a module being precompiled. (The latter condition ensures it will
actually be called by package methods, and not merely transiently
generated for the purpose of, e.g., metaprogramming or variable
initialization.) This makes precompilation more intuitive (now it
saves all relevant inference results), and substantially reduces
latency for inference-bound packages.

Closes JuliaLang#42016
Fixes JuliaLang#35972

Issue JuliaLang#35972 arose because codegen got started without re-inferring
some discarded CodeInstances. This forced the compiler to insert a
`jl_invoke`. This PR fixes the issue because needed CodeInstances are
no longer discarded by precompilation.
KristofferC pushed a commit that referenced this pull request Mar 7, 2022
Prior to this PR, Julia's precompiled `*.ji` files saved just two
categories of code: unspecialized method definitions and
type-specialized code for the methods defined by the package.  Any
novel specializations of methods from Base or previously-loaded
packages were not saved, and therefore effectively thrown away.

This PR caches all the code---internal or external---called during
package definition that hadn't been previously inferred, as long
as there is a backedge linking it back to a method owned by
a module being precompiled. (The latter condition ensures it will
actually be called by package methods, and not merely transiently
generated for the purpose of, e.g., metaprogramming or variable
initialization.) This makes precompilation more intuitive (now it
saves all relevant inference results), and substantially reduces
latency for inference-bound packages.

Closes #42016
Fixes #35972

Issue #35972 arose because codegen got started without re-inferring
some discarded CodeInstances. This forced the compiler to insert a
`jl_invoke`. This PR fixes the issue because needed CodeInstances are
no longer discarded by precompilation.

(cherry picked from commit df81bf9)
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
Prior to this PR, Julia's precompiled `*.ji` files saved just two
categories of code: unspecialized method definitions and
type-specialized code for the methods defined by the package.  Any
novel specializations of methods from Base or previously-loaded
packages were not saved, and therefore effectively thrown away.

This PR caches all the code---internal or external---called during
package definition that hadn't been previously inferred, as long
as there is a backedge linking it back to a method owned by
a module being precompiled. (The latter condition ensures it will
actually be called by package methods, and not merely transiently
generated for the purpose of, e.g., metaprogramming or variable
initialization.) This makes precompilation more intuitive (now it
saves all relevant inference results), and substantially reduces
latency for inference-bound packages.

Closes JuliaLang#42016
Fixes JuliaLang#35972

Issue JuliaLang#35972 arose because codegen got started without re-inferring
some discarded CodeInstances. This forced the compiler to insert a
`jl_invoke`. This PR fixes the issue because needed CodeInstances are
no longer discarded by precompilation.
@KristofferC KristofferC removed the backport 1.8 Change should be backported to release-1.8 label Mar 15, 2022
timholy added a commit that referenced this pull request Jul 13, 2022
This fixes a separate bug in which #43990 should have checked
that the method had not been deleted. Tests passed formerly
simply because we weren't caching external CodeInstances
that inferred down to a `Const`; fixing that exposed the bug.
This bug has been exposed since merging #43990 for non-`Const`
inference, and would affect Revise etc.
timholy added a commit that referenced this pull request Jul 21, 2022
This allows a MethodInstance to store dispatch tuples as well as other
MethodInstances among their backedges. The motivation is to properly
handle invalidation with `invoke`, which allows calling a
less-specific method and thus runs afoul of our standard mechanisms to
detect "outdated" dispatch.

Additionally:

- provide backedge-iterators for both C and Julia that abstracts
  the specific storage mechanism.

- fix a bug in the CodeInstance `relocatability` field, where methods
  that only return a constant (and hence store `nothing` for
  `inferred`) were deemed non-relocatable.

- fix a bug in which #43990 should have checked that the method had
  not been deleted. Tests passed formerly simply because we weren't
  caching external CodeInstances that inferred down to a `Const`;
  fixing that exposed the bug.  This bug has been exposed since
  merging #43990 for non-`Const` inference, and would affect Revise
  etc.
@JeffBezanson JeffBezanson added the compiler:precompilation Precompilation of modules label Jul 28, 2022
timholy added a commit that referenced this pull request Aug 22, 2022
This allows a MethodInstance to store dispatch tuples as well as other
MethodInstances among their backedges. The motivation is to properly
handle invalidation with `invoke`, which allows calling a
less-specific method and thus runs afoul of our standard mechanisms to
detect "outdated" dispatch.

Additionally:

- provide backedge-iterators for both C and Julia that abstracts
  the specific storage mechanism.

- fix a bug in the CodeInstance `relocatability` field, where methods
  that only return a constant (and hence store `nothing` for
  `inferred`) were deemed non-relocatable.

- fix a bug in which #43990 should have checked that the method had
  not been deleted. Tests passed formerly simply because we weren't
  caching external CodeInstances that inferred down to a `Const`;
  fixing that exposed the bug.  This bug has been exposed since
  merging #43990 for non-`Const` inference, and would affect Revise
  etc.
timholy added a commit that referenced this pull request Aug 24, 2022
This fixes a long-standing issue with how we've handled `invoke` calls
with respect to method invalidation.  When we load a package, we need
to ask whether a given MethodInstance would be compiled in the same
way now (aka, in the user's running session) as when the package was
precompiled; in practice, the way we do that is to test whether the
dispatches would be to the same methods in the current
world-age. `invoke` presents special challenges because it allows the
coder to deliberately select a different method than the one that
would be chosen by ordinary dispatch; if there is no record of how
this choice was made, it can look like it resolves to the wrong method
and this can trigger invalidation.

This allows a MethodInstance to store dispatch tuples as well as other
MethodInstances among their backedges.

Additionally:

- provide backedge-iterators for both C and Julia that abstracts
  the specific storage mechanism.

- fix a bug in the CodeInstance `relocatability` field, where methods
  that only return a constant (and hence store `nothing` for
  `inferred`) were deemed non-relocatable.

- fix a bug in which #43990 should have checked that the method had
  not been deleted. Tests passed formerly simply because we weren't
  caching external CodeInstances that inferred down to a `Const`;
  fixing that exposed the bug.  This bug has been exposed since
  merging #43990 for non-`Const` inference, and would affect Revise
  etc.

Co-authored-by: Jameson Nash <[email protected]>
vtjnash pushed a commit that referenced this pull request Nov 29, 2022
This includes only the changes to `dump.c` for this change, but excludes
the functional part of the change (except for the additional bugfixes
mentioned below).

ORIGINAL COMMIT TEXT:
    This fixes a long-standing issue with how we've handled `invoke` calls
    with respect to method invalidation.  When we load a package, we need
    to ask whether a given MethodInstance would be compiled in the same
    way now (aka, in the user's running session) as when the package was
    precompiled; in practice, the way we do that is to test whether the
    dispatches would be to the same methods in the current
    world-age. `invoke` presents special challenges because it allows the
    coder to deliberately select a different method than the one that
    would be chosen by ordinary dispatch; if there is no record of how
    this choice was made, it can look like it resolves to the wrong method
    and this can trigger invalidation.

    This allows a MethodInstance to store dispatch tuples as well as other
    MethodInstances among their backedges.

Additionally:

- provide backedge-iterators for both C and Julia that abstracts
  the specific storage mechanism.

- fix a bug in the CodeInstance `relocatability` field, where methods
  that only return a constant (and hence store `nothing` for
  `inferred`) were deemed non-relocatable.

- fix a bug in which #43990 should have checked that the method had
  not been deleted. Tests passed formerly simply because we weren't
  caching external CodeInstances that inferred down to a `Const`;
  fixing that exposed the bug.  This bug has been exposed since
  merging #43990 for non-`Const` inference, and would affect Revise
  etc.

Co-authored-by: Jameson Nash <[email protected]>

(cherry picked from commits dd375e1,
cb0721b, 9e39fe9)
contra-bit pushed a commit to contra-bit/JLLWrappers.jl that referenced this pull request Jul 10, 2024
This is motivated by JuliaLang/julia#43990 (comment),
but doesn't seem likely to hurt anything even if that never merges.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:latency Compiler latency compiler:precompilation Precompilation of modules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allocates when precompiled