Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with starting multiple Julia process on a cluster at the same time #31953

Closed
newptcai opened this issue May 7, 2019 · 6 comments
Closed
Labels
compiler:precompilation Precompilation of modules

Comments

@newptcai
Copy link

newptcai commented May 7, 2019

I am trying to start multiple Julia processes on a cluster at the same time using a Python script (parallel-ssh). I noticed that, a few of these processes will fail to start, with the following errors.

The cluster has a shared network file system. This could be the source of the issue. But this has not been any problem when previously everything is done in Python.

ERROR: LoadError: SystemError: opening file "/home/myname/.julia/compiled/v1.1/MyPackage/ttkBW.ji": No such file or directory
Stacktrace:
 [1] #systemerror#43(::Nothing, ::Function, ::String, ::Bool) at ./error.jl:134
 [2] systemerror at ./error.jl:134 [inlined]
 [3] #open#309(::Bool, ::Nothing, ::Nothing, ::Nothing, ::Nothing, ::Function, ::String) at ./iostream.jl:283
 [4] #open at ./none:0 [inlined]
 [5] open(::String, ::String) at ./iostream.jl:339
 [6] stale_cachefile(::String, ::String) at ./loading.jl:1321
 [7] _require_search_from_serialized(::Base.PkgId, ::String) at ./loading.jl:693
 [8] _require(::Base.PkgId) at ./loading.jl:937
 [9] require(::Base.PkgId) at ./loading.jl:858
 [10] require(::Module, ::Symbol) at ./loading.jl:853
 [11] include at ./boot.jl:326 [inlined]
 [12] include_relative(::Module, ::String) at ./loading.jl:1038
 [13] include(::Module, ::String) at ./sysimg.jl:29
 [14] exec_options(::Base.JLOptions) at ./client.jl:267
 [15] _start() at ./client.jl:436
in expression starting at /home/myname/code/MyPackage.jl/src/gwsim.jl:3

@newptcai newptcai changed the title Problem with starting multiple julia process at the same time Problem with starting multiple Julia process on a cluster at the same time May 7, 2019
@newptcai
Copy link
Author

newptcai commented May 7, 2019

I added 0.1 sleep time between starting each process. This seems to have avoided the problem.

@vchuravy
Copy link
Sponsor Member

vchuravy commented May 7, 2019

related issue #30174

@WT215
Copy link

WT215 commented Aug 19, 2019

I added 0.1 sleep time between starting each process. This seems to have avoided the problem.

Hi, @newptcai , I have similar issue when I deployed julia in HPC for running jobs. Sometimes the package ``KernelDensity'' can be loaded in a worker correctly but sometimes failed.

How did you add sleep time between starting each process?

Thanks in advance.

@simonbyrne simonbyrne added the compiler:precompilation Precompilation of modules label May 9, 2022
@simonbyrne
Copy link
Contributor

This is still an issue on 1.7.2:
https://buildkite.com/clima/climaatmos-ci/builds/864#babf55c9-1a81-4228-9a61-163128f0e1dd

My guess as to what is happening is that one process is attempting to load a precompile package, while another deletes it from the cache. In particular, it gets deleted between this line
https://github.com/JuliaLang/julia/blob/v1.7.2/base/loading.jl#L1087
and this line
https://github.com/JuliaLang/julia/blob/v1.7.2/base/loading.jl#L1097

There are a couple of options as far as I can see:

  1. some sort of locking to prevent deletion while loading: we now have a Pidfile support in the Filewatching stdlib (Add Pidfile to FileWatching #44367), but unfortunately this code is in Base so it's not clear how we could make use of it.
  2. Allow _require_search_from_serialized to fail gracefully if it can't open the file.

@IanButterworth
Copy link
Sponsor Member

master now has pidlocking around the precompilation process

@simonbyrne
Copy link
Contributor

simonbyrne commented Jul 5, 2023

Appears to be PRs #49052, #50214 & #50254

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:precompilation Precompilation of modules
Projects
None yet
Development

No branches or pull requests

5 participants