Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix #41546, make using thread-safe #41602

Merged
merged 3 commits into from
Oct 25, 2021
Merged

fix #41546, make using thread-safe #41602

merged 3 commits into from
Oct 25, 2021

Conversation

JeffBezanson
Copy link
Sponsor Member

fix #41546

I'm not sure if identify_package needs locking as well.

@JeffBezanson JeffBezanson added domain:packages Package management and loading domain:multithreading Base.Threads and related functionality backport 1.7 labels Jul 15, 2021
Copy link
Member

@Sacha0 Sacha0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much Jeff! This is exciting! I will pull this patch onto our testing branch and report back :). Thanks again!

base/toml_parser.jl Outdated Show resolved Hide resolved
@Sacha0
Copy link
Member

Sacha0 commented Jul 16, 2021

With this patch our test suite silently times out after launching tests, which may or may not be progress? 😄

@@ -1020,13 +1024,14 @@ function require(uuidkey::PkgId)
end
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems suspect that we want to be holding a lock for this large of a chunk of (user) code, including the include and package_callbacks here, but we do need it for all of the shared state, which is quite a bit.

JeffBezanson and others added 3 commits October 19, 2021 11:15
Co-authored-by: Jameson Nash <[email protected]>
use more precision when handling loading lock, merge with TOML lock
(since we typically are needing both, sometimes in unpredictable
orders), and unlock before call most user code
@JeffBezanson JeffBezanson merged commit 3d4b213 into master Oct 25, 2021
@JeffBezanson JeffBezanson deleted the jb/pkgloadinglock branch October 25, 2021 20:20
KristofferC pushed a commit that referenced this pull request Oct 28, 2021
use more precision when handling loading lock, merge with TOML lock
(since we typically are needing both, sometimes in unpredictable
orders), and unlock before call most user code

Co-authored-by: Jameson Nash <[email protected]>
(cherry picked from commit 3d4b213)
KristofferC added a commit that referenced this pull request Oct 29, 2021
@timholy
Copy link
Sponsor Member

timholy commented Dec 4, 2021

git bisect identified this as at least part of what broke Revise on nightly. Here's a replication that doesn't take too long to run:

julia -e 'import Pkg; Pkg.test("Revise";coverage=false, test_args=["New files & Requires.jl"])'

Revise can work around this specific problem by adding ~0.5s sleeps, but that only seems to work when you're not tracking code-coverage (when you are, the tests just hang inside a wait, but I don't know if it's related to this PR). See timholy/Revise.jl#657.

@mkitti
Copy link
Contributor

mkitti commented Dec 11, 2021

I believe this commit was responsible for stalling HDF5.jl's nightly testing for the past two months:
https://github.com/JuliaIO/HDF5.jl/runs/4018361152?check_suite_focus=true

The symptom was that the CI would stall and eventually timeout after 6 hours. The intermediate solution was to add a 20 minute timeout to CI:
JuliaIO/HDF5.jl#877

Upon debugging, I found the issue came when trying to loading external compression codecs when testing filters

[ Info: libhdf5 v1.12.0
┌ Debug: plain
└ @ Main C:\Users\kittisopikulm\.julia\dev\HDF5\test\runtests.jl:14
┌ Debug: compound
└ @ Main C:\Users\kittisopikulm\.julia\dev\HDF5\test\runtests.jl:16
┌ Debug: custom
└ @ Main C:\Users\kittisopikulm\.julia\dev\HDF5\test\runtests.jl:18
┌ Debug: reference
└ @ Main C:\Users\kittisopikulm\.julia\dev\HDF5\test\runtests.jl:20
┌ Debug: dataspace
└ @ Main C:\Users\kittisopikulm\.julia\dev\HDF5\test\runtests.jl:22
┌ Debug: hyperslab
└ @ Main C:\Users\kittisopikulm\.julia\dev\HDF5\test\runtests.jl:24
┌ Debug: readremote
└ @ Main C:\Users\kittisopikulm\.julia\dev\HDF5\test\runtests.jl:26
┌ Debug: extend_test
└ @ Main C:\Users\kittisopikulm\.julia\dev\HDF5\test\runtests.jl:28
┌ Debug: gc
└ @ Main C:\Users\kittisopikulm\.julia\dev\HDF5\test\runtests.jl:30
┌ Debug: external
└ @ Main C:\Users\kittisopikulm\.julia\dev\HDF5\test\runtests.jl:32
┌ Debug: external tests did not delete
│   fn2 = "C:\\cygwin64\\tmp\\jl_TuY7eOkBtL"
└ @ Main C:\Users\kittisopikulm\.julia\dev\HDF5\test\external.jl:57
┌ Debug: swmr
└ @ Main C:\Users\kittisopikulm\.julia\dev\HDF5\test\runtests.jl:34
┌ Debug: mmap
└ @ Main C:\Users\kittisopikulm\.julia\dev\HDF5\test\runtests.jl:36
┌ Debug: properties
└ @ Main C:\Users\kittisopikulm\.julia\dev\HDF5\test\runtests.jl:38
┌ Debug: table
└ @ Main C:\Users\kittisopikulm\.julia\dev\HDF5\test\runtests.jl:40
┌ Debug: filter
└ @ Main C:\Users\kittisopikulm\.julia\dev\HDF5\test\runtests.jl:42
# Then it deadlocks here.

After adding some debug messages to trace package loading, I found that a second process was trying to load one of the compression codecs via the H5Zbzip2.jl subdirectory package, which loads CodecBzip2.jl and Bzip2_jll. The second process came from the swmr.jl tests which used Distributed.addproc(1) to add a process. Removing the second process resolved the issue solved the stall in the tests.

Commenting out rmprocs is sufficient to cause the stall.

I have created a branch / draft pull request on HDF5.jl for the purpose of investigating this issue:
JuliaIO/HDF5.jl#881

@Sacha0
Copy link
Member

Sacha0 commented Dec 11, 2021

Possibly related to #43339 (comment)

LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Feb 22, 2022
use more precision when handling loading lock, merge with TOML lock
(since we typically are needing both, sometimes in unpredictable
orders), and unlock before call most user code

Co-authored-by: Jameson Nash <[email protected]>
LilithHafner pushed a commit to LilithHafner/julia that referenced this pull request Mar 8, 2022
use more precision when handling loading lock, merge with TOML lock
(since we typically are needing both, sometimes in unpredictable
orders), and unlock before call most user code

Co-authored-by: Jameson Nash <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:multithreading Base.Threads and related functionality domain:packages Package management and loading
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Concurrency violation during package loading
6 participants