Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Darwin/ARM64: Julia freezes on @threads loops #45626

Closed
ssagl opened this issue Jun 9, 2022 · 15 comments · Fixed by #45899
Closed

Darwin/ARM64: Julia freezes on @threads loops #45626

ssagl opened this issue Jun 9, 2022 · 15 comments · Fixed by #45899
Labels
domain:multithreading Base.Threads and related functionality system:apple silicon Affects Apple Silicon only (Darwin/ARM64) - e.g. M1 and other M-series chips

Comments

@ssagl
Copy link

ssagl commented Jun 9, 2022

I'm encountering a previously documented issue when running @threads loops on the arm64 native 1.8.0-rc1 version of julia. Apparently, this has been a known issue before but marked as closed (link to issue). The bug still seems to exist and is causing troubles for me. I'll quote a minimal example from the thread that initially documented this issue (Originally posted by @gbaraldi in #41820 (comment)):

julia> function foo()
          Threads.@threads for i in 1:10
               rand()
           end
       end
foo (generic function with 1 method)

julia> for i in 1:1000
           println(i)
           for j in 1:10000
               foo()
           end
       end

This freezes for me somewhere in i = 11 on my Mac with the current 1.8.0-rc1 version of julia.

Below I attach the exact output of versioninfo():

julia> versiononfo()
Julia Version 1.8.0-rc1
Commit 6368fdc6565 (2022-05-27 18:33 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.3.0)
  CPU: 10 × Apple M1 Pro
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 8 on 8 virtual cores
@gbaraldi
Copy link
Member

gbaraldi commented Jun 9, 2022

I can't reproduce it all 🤔

@ssagl
Copy link
Author

ssagl commented Jun 9, 2022

thank you @gbaraldi. I ran into this issue multiple times over the last 48 hours and was able to consistently reproduce it in the REPL in julia 1.7.2 as well as in julia 1.8.0-beta3 and julia 1.8.0-rc1 and thought this was happening all the time. After reading your post, I tried to rerun what I did earlier and it seemingly disappeared which confused me a lot.

However, there's something new and even more interesting I found out now thanks to your post: this is not happening every time I run this script. I have to run the script over and over again and it only sometimes freezes (requires quite a lot of tries). Right now I have two different terminals opened on my Mac in which I ran the above posted codes and they are frozen at i = 8 and i = 11 respectively.

@ssagl
Copy link
Author

ssagl commented Jun 9, 2022

something more I found out by now: this is even more likely to happen when using 2 terminal sessions executing the above codes at the same time, but running two sessions is not necessary to produce the freeze on my system. Also, I'm using 8 threads but this should be apparent from my versioninfo() I posted above.

@giordano giordano added domain:multithreading Base.Threads and related functionality system:apple silicon Affects Apple Silicon only (Darwin/ARM64) - e.g. M1 and other M-series chips labels Jun 10, 2022
@fxcoudert
Copy link
Contributor

I've been trying to reproduce this with:

Julia Version 1.9.0-DEV.612
Commit 2159bfba2f (2022-05-18 18:02 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.5.0)
  CPU: 8 × Apple M1 Pro
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 6 on 6 virtual cores

but I can't. I can even run up to 8 terminals running this same code, all on 6 threads, without issue. I repeated multiple times, but cannot get it to freeze. So either it's specific to 1.8, or there is some weird trigger hidden…

@ssagl
Copy link
Author

ssagl commented Jun 10, 2022

thanks for your effort @fxcoudert. I recorded my screen when doing this with two open terminal sessions because I'm so puzzled as to why this would only happen on my machine. I really don't know what the trigger is, but if I do something wrong in the terminal session please tell me. The function foo.jl that I run in these terminal sessions is exactly what I posted above...

threads_freeze.mov

@giordano
Copy link
Contributor

giordano commented Jun 10, 2022

I was able to reproduce #41820 very easily back when it was a problem, but #43418 fixed it for me. I tried to run the code above multiple times, it never froze:

julia> versioninfo()
Julia Version 1.9.0-DEV.732
Commit ada860fe7d* (2022-06-10 15:07 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.5.0)
  CPU: 8 × Apple M1
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 4 on 4 virtual cores

@ssagl can you check CPU usage in Activity Monitor? When #41820 was an issue it used to drop to 0 for me, indicating a deadlock situation. I agree with @fxcoudert that there may be something else going on causing the issue for you.

@ssagl
Copy link
Author

ssagl commented Jun 10, 2022

thanks @giordano. I checked and when it freezes it seems like CPU usage is not going to zero but goes down from initially around 400 % to 100%. Screenshot from the Activity Monitor when it freezes below. Doesn't seem like a deadlock like you experienced previously.

Screen Shot 2022-06-10 at 7 49 00 PM

@vchuravy
Copy link
Sponsor Member

Is there a way on Mac OS to see which cores it got scheduled on? Maybe it's something with the efficiency cores?

@ssagl
Copy link
Author

ssagl commented Jun 11, 2022

@vchuravy yes there is, thanks! I attach a snapshot of my CPU usage history when running the codes. Right when it starts to freeze, performance cores 6, 7, 8, 9, and 10 seem to not do anything anymore. Performance core 6 seems to come back at some time but not sure what that all means.

cpu_history

@ngam
Copy link

ngam commented Jun 12, 2022

Can reproduce with two concurrent sessions (i.e. two terminals with the julia-1.8 -t 8 foo.jl like above) but interestingly cannot reproduce with -t 3 or lower and it tends to get stuck at a higher value (in the counter) for lower -t x, e.g. stopping around 30 for -t 8, 70 for -t 7, 120 for -t 5, 700 for -t 5 (but sometimes, around 50 for -t 4). Only one of the two processes got stuck for me.

I will let @vchuravy take it from here... but do note the behavior of the -t x vis-a-vis "virtual cores" in the collapsed section below.


~$ julia-1.8            
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.0-rc1 (2022-05-27)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> versioninfo
versioninfo (generic function with 2 methods)

julia> versioninfo()
Julia Version 1.8.0-rc1
Commit 6368fdc6565 (2022-05-27 18:33 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.3.0)
  CPU: 10 × Apple M1 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 1 on 8 virtual cores

julia> versioninfo^C

julia> 
~$ julia-1.8 -t 8
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.0-rc1 (2022-05-27)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> versioninfo()
Julia Version 1.8.0-rc1
Commit 6368fdc6565 (2022-05-27 18:33 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.3.0)
  CPU: 10 × Apple M1 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 8 on 8 virtual cores

julia> 
~$ julia-1.8 -t 10
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.0-rc1 (2022-05-27)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> versioninfo()
Julia Version 1.8.0-rc1
Commit 6368fdc6565 (2022-05-27 18:33 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.3.0)
  CPU: 10 × Apple M1 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, apple-m1)
  Threads: 10 on 8 virtual cores

julia> 
~$ julia-1.8 -t 3 
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.0-rc1 (2022-05-27)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> versioninfo(^C

julia> 
~$ julia-1.8     
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.0-rc1 (2022-05-27)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> 

@ngam
Copy link

ngam commented Jun 12, 2022

Okay, I can now also reproduce with one process only, with julia-1.8 -t 8 foo.jl.

One is not guaranteed to get the performance cores by simply staying below the total number of performance cores fwiw. One is likely going to get assigned efficiency cores anyway if they are asking for all or most of them, even on a 8-to-2 split like here.

Stuff printed after ^C below.

^C
signal (2): Interrupt: 2
in expression starting at /Users/ngam/foo.jl:7
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
kevent at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
__psynch_cvwait at /usr/lib/system/libsystem_kernel.dylib (unknown line)
unknown function (ip: 0x0)
JL_UV_LOCK at /Applications/Julia-1.8.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.8.dylib (unknown line)
ijl_exit_threaded_region at /Applications/Julia-1.8.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.8.dylib (unknown line)
threading_run at ./threadingconstructs.jl:41
macro expansion at ./threadingconstructs.jl:89 [inlined]
foo at /Users/ngam/foo.jl:2 [inlined]
top-level scope at /Users/ngam/foo.jl:10
jl_toplevel_eval_flex at /Applications/Julia-1.8.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.8.dylib (unknown line)
jl_toplevel_eval_flex at /Applications/Julia-1.8.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.8.dylib (unknown line)
ijl_toplevel_eval_in at /Applications/Julia-1.8.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.8.dylib (unknown line)
eval at ./boot.jl:368 [inlined]
include_string at ./loading.jl:1281
ijl_apply_generic at /Applications/Julia-1.8.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.8.dylib (unknown line)
_include at ./loading.jl:1341
include at ./Base.jl:422
jfptr_include_28021 at /Applications/Julia-1.8.app/Contents/Resources/julia/lib/julia/sys.dylib (unknown line)
ijl_apply_generic at /Applications/Julia-1.8.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.8.dylib (unknown line)
exec_options at ./client.jl:303
_start at ./client.jl:522
jfptr__start_29833 at /Applications/Julia-1.8.app/Contents/Resources/julia/lib/julia/sys.dylib (unknown line)
ijl_apply_generic at /Applications/Julia-1.8.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.8.dylib (unknown line)
true_main at /Applications/Julia-1.8.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.8.dylib (unknown line)
jl_repl_entrypoint at /Applications/Julia-1.8.app/Contents/Resources/julia/lib/julia/libjulia-internal.1.8.dylib (unknown line)
unknown function (ip: 0x0)
Allocations: 113115497 (Pool: 113115426; Big: 71); GC: 197

@ssagl
Copy link
Author

ssagl commented Jun 12, 2022

@ngam thanks a ton, finally somebody who can reproduce this issue! I already thought this problem is just specific to my machine. Concerning that you can only reproduce with two concurrent sessions: I think it happens extremely infrequently with just one session (but that's still how I even noted this problem). I can also confirm that with -t 3 or lower it doesn't happen on my machine as well.

@ngam
Copy link

ngam commented Jun 12, 2022

Concerning that you can only reproduce with two concurrent sessions

I can now...

But I suspect the line of reasoning of vchuravy above is key to figuring this out.

@ssagl
Copy link
Author

ssagl commented Jun 12, 2022

Concerning that you can only reproduce with two concurrent sessions

I can now...

But I suspect the line of reasoning of vchuravy above is key to figuring this out.

oh we apparently posted this at the same time. great to hear you can reproduce everything now @ngam.

@ngam
Copy link

ngam commented Jun 12, 2022

My observation playing with this is that it is significantly more likely to happen if you use efficiency cores. You can notice the degradation very early on with much slower printing of the index. But anyway, this is a totally non-scientific stat below of three consecutive runs with time ... (meaning ... -t 4 ... then ... -t 5 ... then ... -t 3 ...). It may also help to profile the whole thing as a function of x in -t x?

julia-1.8 -t 4 foo.jl 134.00s user 27.35s system 339% cpu 47.578 total
julia-1.8 -t 5 foo.jl 334.12s user 58.80s system 412% cpu 1:35.28 total
julia-1.8 -t 3 foo.jl 80.21s user 24.43s system 251% cpu 41.670 total

edit: foo.jl below

function foo()
        Threads.@threads for i in 1:10
                rand()
        end
end

for i in 1:1000
        println(i)
        for j in 1:10000
                foo()
        end
end

vtjnash added a commit that referenced this issue Jul 1, 2022
vtjnash added a commit that referenced this issue Jul 6, 2022
@tkf tkf closed this as completed in #45899 Jul 7, 2022
tkf pushed a commit that referenced this issue Jul 7, 2022
KristofferC pushed a commit that referenced this issue Jul 7, 2022
Closes #45626, hopefully.

(cherry picked from commit f7e0c7e)
KristofferC pushed a commit that referenced this issue Jul 8, 2022
Closes #45626, hopefully.

(cherry picked from commit f7e0c7e)
ffucci pushed a commit to ffucci/julia that referenced this issue Aug 11, 2022
pcjentsch pushed a commit to pcjentsch/julia that referenced this issue Aug 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:multithreading Base.Threads and related functionality system:apple silicon Affects Apple Silicon only (Darwin/ARM64) - e.g. M1 and other M-series chips
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants