-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possible bug when using both processes and threads, and dynamic remote channels between them #78
Comments
@jonas-schulze - thanks for the pointer! I have also tried to replicate the problem w/o using any worker threads - see https://discourse.julialang.org/t/are-julia-channels-futures-thread-safe-with-a-failing-code-example/64490 - it is possible that Distributed has problem with Threads even in a single process? If so that would explain the problem I am seeing, but it seems this would go beyond the scope of #73? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I have code which:
The motivation is to have one multi-threaded process on each server in a compute cluster, but in the code demonstrating the bug (see below), all processes run on the same (local) machine; the code doesn't make use of this fact (does not use shared memory or atomics, only remote channels).
Leaving aside whether this is a good idiom, the approach is legal and should work(?).
However, running this in Julia 1.6.1 produces nondeterministic failures:
And the error message complains that:
This seems to be a bug, unless the code does something "forbidden" (it doesn't seem to?). It was suggested the GC issues might be related to JuliaLang/julia#38180 but this doesn't seem to cover the concurrency errors in the crash traces.
The source code and output crash traces are available in https://gist.github.com/orenbenkiki/ac71f348d4915b394805656b142b33fe
To run it type
JULIA_NUM_THREADS=4 julia Bug.jl 4 1000 quiet
- you can play with the number of threads, number of processes (here, also 4), number of requests sent by each thread of each process (here, 1000), and whether the code isquiet
orverbose
(the latter usesprintln
andflush
a lot which will impact the behavior).The text was updated successfully, but these errors were encountered: