-
-
Notifications
You must be signed in to change notification settings - Fork 5.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crashing with @everywhere using #12381
Comments
... and bisect is broken:
Will try some brute-force. |
No crashes, but errors here:
I will assume that this is a separate issue and will attempt to pinpoint the version that crashes. |
did you precompile this package? |
@jakebolewski at one time, yes, but ~/.julia/lib/v0.4 is currently empty. |
According to bisect results,
Validating that the previous version actually works... there were several errors encountered throughout. |
The version prior to bisect's "first bad commit" turns out to be #12381 (comment) so it looks like Tim's commit fixed that bug but may have uncovered the crash bug. cc @timholy |
re: #12381 (comment), if you go back far enough that dependencies changed versions you'll often need to do |
Following is with no cleaning before rebuild. I get different behavior on two machines
Hangs (for at least several minutes) with:
Different machine
|
I also find that one to be strange as a culprit, but CCing @ScottPJones anyway. |
I'm not at home now, but I can't see how it would have an effect, unless there were invalid UTF-8 data that wasn't detected before, but you would have seen a |
Interesting. What I posted above was performed on OSX. On Linux, both commits work just fine. Current master on Linux fails, though, as on OSX. If nobody beats me to it I will bisect again on both systems tomorrow. |
i don't think a bisect is entirely required for this one: from the issue description above, it's apparent that there's a race condition between the call to |
bisects are not at all accurate for things like race conditions - they can only give you some idea as to some bounds where the bug was introduced. |
Yes, it looks like a race condition. The type of error produced (and whether they occur at all) changes from run to run. |
For testing, I find the bug more likely to occur with four processes than two. |
If I do
Then building commit 88bb2e9 shows the bug. |
FWIW, sleeping a little makes it pass for me again on current master: ./julia -e "addprocs(6); @everywhere sleep(0.1); using JSON, FactCheck, Compat" Calling it without ./julia -e "addprocs(6); using JSON, FactCheck, Compat" |
sleeping a little makes everything pass |
#12581 does seem to fix the segfault on my local machine. Request other folks to tests it out. This is what I get.
and
Warnings and errors but no segfault. |
why are you getting "WARNING: node state is inconsistent" there? that generally is going to be really, really bad. |
I cleaned Now with
Why did it precompile the first time around and not now? |
The |
@stevengj It makes sense that your change would fix the segfault. Is there evidence of some other problem as well, or are we done here? |
We haven't had any indication that it is still segfaulting. It would be good to have an issue for eliminating the warning, but probably that should be a separate issue. |
Should we then take this off the 0.4 milestone list? |
i think there are a few improvements that can be made:
|
Is the following deserialization error on workers a manifestation of this race condition?
results in reloading module warnings and then a large number of workers exiting with error:
Moving |
We faced the same issue in DecisionTree.jl, and I've boiled it down to this. No precompilation necessary (on Julia 0.4, OSX) # B.jl
module B
end # C.jl
module C
abstract AbstractAbstract
end # A.jl
module A
using B # can be any module
include("incl.jl") # problem disappears if the import is done in A.jl
end # incl.jl
import C: AbstractAbstract
type Obj <: AbstractAbstract end then interactively: addprocs(3)
@everywhere using A
> On worker 2: UndefVarError: AbstractAbstract not defined |
|
I'm not sure this is a closed issue (I'm on 0.5.0). for p in procs()
@fetchfrom p reload("Package")
end |
@pearcemc, do |
Fixed by #21718? |
…ove include_from_node1 (JuliaLang#22588) * include: assume that shared file systems exist for clusters remove buggy support for emulating shared file-systems from Julia: the kernel is much better at this, and can do it transparently * only broadcast using/import to nodes which need it fix JuliaLang#12381 fix JuliaLang#13999
Per https://groups.google.com/d/msg/julia-users/FjGXSTzvfmc/j0ZDG629IwAJ
This works on
0.4.0-dev+5008 (2015-05-26 16:08 UTC) Commit 0855ec9
Next step:
git bisect
. Stand by.The text was updated successfully, but these errors were encountered: