Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More lock contention in Julia 1.6 to 1.8 than in 1.5.1 (I/O with parallelism strictly not the reason) #39598

Closed
PallHaraldsson opened this issue Feb 10, 2021 · 16 comments
Labels
kind:regression Regression in behavior compared to a previous version performance Must go faster

Comments

@PallHaraldsson
Copy link
Contributor

PallHaraldsson commented Feb 10, 2021

I ran with Firefox turned off, and all 16 virtual cores were idle when starting (strangely load still about "1,08"):

https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/binarytrees-julia-4.html

$ hyperfine '~/julia-1.6.0-rc1/bin/julia --startup-file=no -p4 -O3 bt.jl 21'
Benchmark #1: ~/julia-1.6.0-rc1/bin/julia --startup-file=no -p4 -O3 bt.jl 21
  Time (mean ± σ):     16.958 s ±  0.412 s    [User: 47.435 s, System: 2.919 s]
  Range (min … max):   16.646 s … 17.780 s    10 runs

$ hyperfine '~/julia-1.5.1/bin/julia --startup-file=no -p4 -O3 bt.jl 21'
Benchmark #1: ~/julia-1.5.1/bin/julia --startup-file=no -p4 -O3 bt.jl 21
 ⠦ Current estimate: 11.319 s


https://benchmarksgame-team.pages.debian.net/benchmarksgame/program/fasta-julia-7.html

$ hyperfine '~/julia-1.6.0-rc1/bin/julia --startup-file=no -t4 -O3 fasta.jl 25000000 > input25000000.txt'
Benchmark #1: ~/julia-1.6.0-rc1/bin/julia --startup-file=no -t4 -O3 fasta.jl 25000000 > input25000000.txt
  Time (mean ± σ):      2.648 s ±  0.091 s    [User: 4.712 s, System: 1.102 s]
  Range (min … max):    2.535 s …  2.803 s    10 runs
 
$ hyperfine '~/julia-1.5.1/bin/julia --startup-file=no -t4 -O3 fasta.jl 25000000 > input25000000.txt'
Benchmark #1: ~/julia-1.5.1/bin/julia --startup-file=no -t4 -O3 fasta.jl 25000000 > input25000000.txt
  Time (mean ± σ):      2.522 s ±  0.052 s    [User: 6.041 s, System: 0.911 s]
  Range (min … max):    2.435 s …  2.582 s    10 runs

https://discourse.julialang.org/t/julia-1-6-0-rc1-and-1-7-with-threads-is-slower-considerations-for-the-benchmark-game/54957

@KristofferC KristofferC added the performance Must go faster label Feb 10, 2021
@JeffBezanson
Copy link
Sponsor Member

For binarytrees, if you put @time binary_trees(...) inside the script it comes out fine, so I suspect the startup time just got longer.

@JeffBezanson
Copy link
Sponsor Member

JeffBezanson commented Feb 12, 2021

Indeed, if you try something like time ./julia -p4 -e0 it went from 1.9 sec (v1.5) to 7 sec (v1.6) (!!)

@JeffBezanson JeffBezanson added the kind:regression Regression in behavior compared to a previous version label Feb 12, 2021
@JeffBezanson
Copy link
Sponsor Member

Looks like probably #37816

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Mar 10, 2021

The fix, if it's in, seems only for Distributed, not Threads. Should this hold up a new RC? EDIT: Looking at my timing, I see it's only minor regression for the latter, so no, but for the former, I would really like it confirmed that the fix is in...

#39977

@KristofferC
Copy link
Sponsor Member

I would really like it confirmed that the fix is in

Nothing has been changed with respect to this. Removing the precompile signatures in #37816 was intentional since they caused a larger sysimage and invalidations and it isn't clear that it is worth it.

@PallHaraldsson PallHaraldsson changed the title Two types of parallelism slower in Julia 1.6.0-rc1 Two types of parallelism slower in Julia 1.6.0-rc2 and 1.7 Mar 16, 2021
@PallHaraldsson
Copy link
Contributor Author

Threading is getting noticeably slower on 1.7 (52% slower than 1.5.1; 20% slower than rc2):

$ hyperfine '~/julia-1.6.0-rc2/bin/julia --startup-file=no -t4 -O3 fasta.jl 25000000 > input25000000.txt'
Benchmark #1: ~/julia-1.6.0-rc2/bin/julia --startup-file=no -t4 -O3 fasta.jl 25000000 > input25000000.txt
  Time (mean ± σ):      2.922 s ±  0.129 s    [User: 5.522 s, System: 1.151 s]
  Range (min … max):    2.735 s …  3.152 s    10 runs
 
$ hyperfine '~/julia-1.7-DEV-da96fef327/bin/julia --startup-file=no -t4 -O3 fasta.jl 25000000 > input25000000.txt'
Benchmark #1: ~/julia-1.7-DEV-da96fef327/bin/julia --startup-file=no -t4 -O3 fasta.jl 25000000 > input25000000.txt
  Time (mean ± σ):      3.470 s ±  0.143 s    [User: 6.904 s, System: 1.129 s]
  Range (min … max):    3.300 s …  3.676 s    10 runs
 
$ hyperfine '~/julia-1.7-DEV-da96fef327/bin/julia --startup-file=no -t4 -O2 fasta.jl 25000000 > input25000000.txt'
Benchmark #1: ~/julia-1.7-DEV-da96fef327/bin/julia --startup-file=no -t4 -O2 fasta.jl 25000000 > input25000000.txt
  Time (mean ± σ):      3.401 s ±  0.120 s    [User: 6.702 s, System: 1.094 s]
  Range (min … max):    3.224 s …  3.605 s    10 runs
 
$ hyperfine '~/julia-1.5.1/bin/julia --startup-file=no -t4 -O3 fasta.jl 25000000 > input25000000.txt'
Benchmark #1: ~/julia-1.5.1/bin/julia --startup-file=no -t4 -O3 fasta.jl 25000000 > input25000000.txt
  Time (mean ± σ):      2.629 s ±  0.187 s    [User: 6.407 s, System: 0.932 s]
  Range (min … max):    2.172 s …  2.877 s    10 runs
 
$ hyperfine '~/julia-1.5.1/bin/julia --startup-file=no -t4 -O2 fasta.jl 25000000 > input25000000.txt'
Benchmark #1: ~/julia-1.5.1/bin/julia --startup-file=no -t4 -O2 fasta.jl 25000000 > input25000000.txt
  Time (mean ± σ):      2.709 s ±  0.075 s    [User: 6.472 s, System: 0.948 s]
  Range (min … max):    2.540 s …  2.818 s    10 runs

This time I'm testing under some load (i.e. Firefox running), but that's also a valuable test.

Looks like probably #37816

Maybe, but that's only for the other kind of slowdown (not fixed on 1.7).

@giordano
Copy link
Contributor

With 4 threads:

julia> @btime fasta(25000000, $devnull)
  843.131 ms (78236 allocations: 960.04 MiB) # Version 1.5.4 (2021-03-11)
  894.791 ms (83410 allocations: 958.41 MiB) # Version 1.6.0-rc2 (2021-03-11)
  897.782 ms (83951 allocations: 958.42 MiB) # Version 1.7.0-DEV.714 (2021-03-16)

I think you're mostly benchmarking I/O?

Also, I know you love hyperfine (and it's a great tool to benchmark applications!), but it always includes Julia startup time.

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Mar 16, 2021

Yes, I kind of love hyperfine! And it's good to know the real cause, but it's also important to fix.

When I looked into IO, Julia seemed to have too much locking (I tried unsuccessfully to lower it), and it's known to be a problem/slower than in other languages, but still it shouldn't be more of a problem in later Julia versions.

The number of allocations seems to be exactly the same each time "71.69 k allocations", without threading, and even across versions 1.5.1 and 1.7, but is not with threading on, in older, or here for 1.7:

julia> @btime fasta(25000000, $devnull)  # does btime show fewest allocations? It's not always the same but on the lower side, unlike for @benchmark, likely a fluke
  1.316 s (79269 allocations: 958.28 MiB)

julia> @time fasta(25000000, devnull)
  1.348268 seconds (90.21 k allocations: 958.616 MiB, 2.05% gc time)

julia> @time fasta(25000000, devnull)
  1.436310 seconds (82.22 k allocations: 958.372 MiB, 0.98% gc time)

julia> @benchmark fasta(25000000, $devnull)
BenchmarkTools.Trial: 
  memory estimate:  958.56 MiB
  allocs estimate:  88238
  --------------
  minimum time:     1.229 s (0.86% GC)
  median time:      1.316 s (0.80% GC)
  mean time:        1.312 s (0.85% GC)
  maximum time:     1.387 s (0.74% GC)
  --------------
  samples:          4
  evals/sample:     1

@PallHaraldsson PallHaraldsson changed the title Two types of parallelism slower in Julia 1.6.0-rc2 and 1.7 Two types of parallelism slower in Julia 1.6.0 and 1.7 Mar 26, 2021
@PallHaraldsson PallHaraldsson changed the title Two types of parallelism slower in Julia 1.6.0 and 1.7 I/O (with parallelism) slower in Julia 1.6.0 and 1.7 Mar 26, 2021
@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Mar 26, 2021

FYI: Actually 1.6.0 is faster up to 2 threads, just 1.5.1 keeps getting faster up to about 7 threads while 1.6 gets slower:

$ hyperfine '/home/pharaldsson_sym/julia-1.6.0/bin/julia --startup-file=no -t2 -O3 fasta.jl 25000000 > /dev/null'
Benchmark #1: /home/pharaldsson_sym/julia-1.6.0/bin/julia --startup-file=no -t2 -O3 fasta.jl 25000000 > /dev/null
  Time (mean ± σ):      2.326 s ±  0.050 s    [User: 4.107 s, System: 0.610 s]
  Range (min … max):    2.273 s …  2.418 s    10 runs
 
$ hyperfine '/home/pharaldsson_sym/julia-1.5.1_old/bin/julia --startup-file=no -t2 -O3 fasta.jl 25000000 > /dev/null'
Benchmark #1: /home/pharaldsson_sym/julia-1.5.1_old/bin/julia --startup-file=no -t2 -O3 fasta.jl 25000000 > /dev/null
  Time (mean ± σ):      2.687 s ±  0.089 s    [User: 5.094 s, System: 0.573 s]
  Range (min … max):    2.570 s …  2.876 s    10 runs


$ hyperfine '/home/pharaldsson_sym/julia-1.5.1_old/bin/julia --startup-file=no -t7 -O3 fasta.jl 25000000 > /dev/null'
Benchmark #1: /home/pharaldsson_sym/julia-1.5.1_old/bin/julia --startup-file=no -t7 -O3 fasta.jl 25000000 > /dev/null
  Time (mean ± σ):      2.101 s ±  0.051 s    [User: 9.770 s, System: 0.643 s]
  Range (min … max):    2.047 s …  2.210 s    10 runs

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Jun 29, 2021

Strange Julia error, but first reporting in chronological order the timings:

It's not better with 1.8 (and similar on 1.7):

$ hyperfine '~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -p4 -O3 bt.jl 21'
Benchmark #1: ~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -p4 -O3 bt.jl 21
  Time (mean ± σ):     19.799 s ±  0.806 s    [User: 53.618 s, System: 3.156 s]
  Range (min … max):   18.955 s … 21.475 s    10 runs

while it is better with -O0:

$ hyperfine '~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -p4 -O0 bt.jl 21'
Benchmark #1: ~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -p4 -O0 bt.jl 21
 ⠋ Current estimate: 14.140 s

These numbers are with Firefox on, and without I get at least 1 sec better. julia-1.6.0-rc2 may be slightly faster on same conditions (I tried suspending the main firefox process, while "Web Content" processes weren't), and julia-1.5.1 almost certainly still is, getting down to 10 sec. with -O0.

$ hyperfine '~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O3 fasta.jl 25000000 > /dev/null'
Benchmark #1: ~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O3 fasta.jl 25000000 > /dev/null
  Time (mean ± σ):      2.806 s ±  0.113 s    [User: 7.977 s, System: 0.863 s]
  Range (min … max):    2.641 s …  2.940 s    10 runs

$ hyperfine '~/julia-1.6.0-rc1/bin/julia --startup-file=no -t4 -O3 fasta.jl 25000000 > /dev/null'
Benchmark #1: ~/julia-1.6.0-rc1/bin/julia --startup-file=no -t4 -O3 fasta.jl 25000000 > /dev/null
  Time (mean ± σ):      2.221 s ±  0.082 s    [User: 5.230 s, System: 0.763 s]
  Range (min … max):    2.137 s …  2.413 s    10 runs

$ hyperfine '/home/pharaldsson_sym/julia-1.5.1_old/bin/julia --startup-file=no -t4 -O3 fasta.jl 25000000 > /dev/null'
Benchmark #1: /home/pharaldsson_sym/julia-1.5.1_old/bin/julia --startup-file=no -t4 -O3 fasta.jl 25000000 > /dev/null
  Time (mean ± σ):      2.072 s ±  0.062 s    [User: 6.430 s, System: 0.585 s]
  Range (min … max):    1.980 s …  2.169 s    10 runs

The strange error, first working fine:

$ hyperfine '~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O2 fasta.jl 25000000 > /dev/null'
Benchmark #1: ~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O2 fasta.jl 25000000 > /dev/null
  Time (mean ± σ):      2.577 s ±  0.092 s    [User: 7.293 s, System: 0.825 s]
  Range (min … max):    2.452 s …  2.774 s    10 runs

[..]

then out of the blue, only ran the test once then hung indefinitely after spawning one ok julia:

$ hyperfine '~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O2 fasta.jl 25000000 > /dev/null'
Benchmark #1: ~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O2 fasta.jl 25000000 > /dev/null
 ⠹ Current estimate: 2.623 s      ███████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ETA 00:00:21
^C
$ hyperfine '~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O2 fasta.jl 25000000 > /dev/null'
Benchmark #1: ~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O2 fasta.jl 25000000 > /dev/null
 ⠋ Current estimate: 2.843 s      ███████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ETA 00:00:23
^C

also on second try above, then ok:

$ hyperfine '~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O2 fasta.jl 25000000 > /dev/null'
Benchmark #1: ~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O2 fasta.jl 25000000 > /dev/null
  Time (mean ± σ):      3.755 s ±  0.099 s    [User: 8.707 s, System: 0.853 s]
  Range (min … max):    3.593 s …  3.894 s    10 runs

in top I saw:
 8403 pharald+  20   0 2164408 450400 108212 R 399,7  1,4  22:48.48 julia                                                                                              
 8683 pharald+  20   0 1820268 321512  89952 R 399,7  1,0  11:52.85 julia  

these are still running, fan on high. Later with more testing:

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND 
 8683 pharald+  20   0 2071668 315588  67440 R 399,7  1,0  29:35.48 julia                                                                                              
 8403 pharald+  20   0 2164408 408064  65876 R 399,0  1,2  40:31.06 julia                                                                                              
 9312 pharald+  20   0  880732 259356  97936 R  99,7  0,8   0:07.83 julia                                                                                              
 9314 pharald+  20   0  815684 244520  98172 R  99,7  0,7   0:07.84 julia                                                                                              
 9313 pharald+  20   0  881296 361168  97920 R  99,3  1,1   0:07.66 julia                                                                                              
 9315 pharald+  20   0  881292 363512  97884 R  98,3  1,1   0:07.50 julia                                                                                              
 9316 pharald+  20   0  880612 280092  98240 R  95,0  0,9   0:07.71 julia 

These additional 5 julias were for another -p5 test and went away. I naturally get such sh -c ~/julia
spawned by hyperfine, while for the older hyperfine runs, it went away, but not julia it spawned:

$ ps -ef |grep 8402
pharald+  8402 18133  0 13:31 pts/11   00:00:00 sh -c ~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O2 fasta.jl 25000000 > /dev/null
pharald+  8403  8402 99 13:31 pts/11   00:56:49 /home/pharaldsson_sym/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O2 fasta.jl 25000000

$ ps -ef |grep 8682
pharald+  8682 18133  0 13:34 pts/11   00:00:00 sh -c ~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O2 fasta.jl 25000000 > /dev/null
pharald+  8683  8682 99 13:34 pts/11   00:47:11 /home/pharaldsson_sym/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O2 fasta.jl 25000000

Might this be a hint to the possible slowness? I'll wait a bit with killing these processes. The fan is annoying, and I doubt something can be debugged, but let me know then.

Could this have anything to do with memory running out? I see in /var/log/syslog from BEFORE those julia processes started (but I' still puzzled by this, I believe I haven't used julia today exert for benchmarking):

Jun 29 12:44:48 SYMLINUX011 kernel: [9930229.685117] julia[22480]: segfault at 8 ip 00007fab80dd2e47 sp 00007fab7a514c10 error 4 in libLLVM-12jl.so[7fab8006c000+3e24000]
Jun 29 12:44:48 SYMLINUX011 kernel: [9930229.685126] Code: 4b 10 44 88 4b 18 48 89 d8 5b 5d 41 5c c3 0f 1f 44 00 00 48 8b 48 08 45 31 c9 48 8b 50 10 49 89 ca 49 83 e2 f8 4c 39 d5 74 41 <48> 3b 6a 08 41 b8 00 00 00 00 49 0f 44 d0 eb ae 66 0f 1f 84 00 00
Jun 29 12:45:01 SYMLINUX011 CRON[22781]: (root) CMD (command -v debian-sa1 > /dev/null && debian-sa1 1 1)
Jun 29 12:49:55 SYMLINUX011 kernel: [9930537.005870] julia[24365]: segfault at 34438e8 ip 00007fd720b2004e sp 00007fd6c1c02450 error 4 in libjulia-internal.so.1.7[7fd720a74000+2c4000]
Jun 29 12:49:55 SYMLINUX011 kernel: [9930537.005890] Code: 39 06 74 4b 8b 78 40 44 89 e2 49 83 c5 01 48 0f ca 48 31 d7 e8 d3 a4 12 00 49 39 dd 41 89 c4 0f 84 3f 01 00 00 4b 8b 7c ee f8 <48> 8b 47 f8 48 83 e0 f0 45 85 ff 74 cd 48 8b 35 d6 c8 41 00 48 39
Jun 29 12:51:13 SYMLINUX011 kernel: [9930614.727473] julia[24935]: segfault at 34438e8 ip 00007fd6b3b0b256 sp 00007fd654b907c0 error 4 in libjulia-internal.so.1.7[7fd6b39dd000+2c4000]
Jun 29 12:51:13 SYMLINUX011 kernel: [9930614.727484] Code: 06 0f 85 79 ff ff ff e9 ab fe ff ff 0f 1f 84 00 00 00 00 00 55 48 89 f8 48 89 e5 41 56 41 55 41 54 53 48 83 ec 30 48 89 7d b8 <48> 8b 7f f8 48 83 e7 f0 48 39 f7 74 7d 4c 8b 35 06 a5 39 00 49 39

[..]

Jun 29 14:01:24 SYMLINUX011 snapd[7021]: storehelpers.go:551: cannot refresh: snap has no updates available: "core", "core18", "core20", "cpustat", "dotnet-sdk", "ffmpeg", "firefox", "gnome-3-28-1804", "gnome-3-34-1804", "gnome-calculator", "gnome-characters", "gnome-logs", "gnome-system-monitor", "gtk-common-themes", "julia-mrcinv", "ldc2", "libreoffice", "lxd", "mycroft", "peco", "slack", "snap-store"

Soon after, it's now 14:17, some minutes ago, Mozilla Crash reporter appeared. Firefox often just crashes, with memory running out (OOM in Linux). I don't see that happening in the log for julia or hyperfine (nor firefox which is unusual, I would usually notice, possibly not now since the reporter kicked in), and I stopped hyperfine myself, and julia didn't got killed. Can memory running out still be affecting? Are other processes involved or just the Linux kernel?

I get "no"/"same" output for either julia process:

$ sudo strace -p 8403
strace: Process 8403 attached
strace: [ Process PID=8403 runs in x32 mode. ]

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Jun 29, 2021

Same problem as in previous comment, while now hung earlier:

$ hyperfine '~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O2 fasta.jl 25000000 > /dev/null'
Benchmark #1: ~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O2 fasta.jl 25000000 > /dev/null
 ⠙ Initial time measurement       ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ETA 00:00:00
^C

and while it hung I got no output, but then all this after CTRL-C:

$ sudo strace -p 14884
strace: Process 14884 attached
strace: [ Process PID=14884 runs in x32 mode. ]
--- SIGUSR2 {si_signo=SIGUSR2, si_code=SI_TKILL, si_pid=14884, si_uid=1000} ---
strace: [ Process PID=14884 runs in 64 bit mode. ]
futex(0x7f6a3da62428, FUTEX_WAKE_PRIVATE, 2147483647) = 1
futex(0x7f6a3da62480, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f6a3da6246c, FUTEX_WAIT_PRIVATE, 0, NULL) = 0
futex(0x7f6a3da6242c, FUTEX_WAKE_PRIVATE, 2147483647) = 1
futex(0x7f6a3da62480, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigreturn({mask=[INT QUIT ABRT USR1 SEGV TERM]}) = 1
--- SIGUSR2 {si_signo=SIGUSR2, si_code=SI_TKILL, si_pid=14884, si_uid=1000} ---
futex(0x7f6a3da62428, FUTEX_WAKE_PRIVATE, 2147483647) = 1
futex(0x7f6a3da62480, FUTEX_WAKE_PRIVATE, 1) = 1
futex(0x7f6a3da6242c, FUTEX_WAKE_PRIVATE, 2147483647) = 1
futex(0x7f6a3da62480, FUTEX_WAKE_PRIVATE, 1) = 1
rt_sigreturn({mask=[INT QUIT ABRT USR1 SEGV TERM]}) = 1
rt_sigprocmask(SIG_UNBLOCK, [QUIT ILL ABRT BUS SEGV TERM], NULL, 8) = 0
futex(0x7f6a3e29f9e0, FUTEX_WAKE_PRIVATE, 2147483647) = 0
openat(AT_FDCWD, "/usr/share/locale/locale.alias", O_RDONLY|O_CLOEXEC) = 16
fstat(16, {st_mode=S_IFREG|0644, st_size=2995, ...}) = 0
read(16, "# Locale name alias data base.\n#"..., 4096) = 2995
read(16, "", 4096)                      = 0
close(16)                               = 0
openat(AT_FDCWD, "/usr/share/locale/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale-langpack/en_US/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/share/locale-langpack/en/LC_MESSAGES/libc.mo", O_RDONLY) = -1 ENOENT (No such file or directory)
write(2, "\nsignal (2): Interrupt\n", 23) = 23
write(2, "in expression starting at /home/"..., 80) = 80
openat(AT_FDCWD, "/home/pharaldsson_sym/julia-1.8-DEV-7553ca13cc/bin/../lib/julia/libjulia-internal.so.1", O_RDONLY|O_CLOEXEC) = 16
fstat(16, {st_mode=S_IFREG|0755, st_size=41432960, ...}) = 0
mmap(NULL, 41432960, PROT_READ, MAP_PRIVATE, 16, 0) = 0x7f69cd6fd000
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [INT USR1], 8) = 0
close(16)                               = 0
rt_sigprocmask(SIG_SETMASK, [INT USR1], NULL, 8) = 0
brk(0x4c9b000)                          = 0x4c9b000
write(2, "_mm_pause at /usr/local/lib/gcc/"..., 93) = 93
write(2, "jl_safepoint_wait_gc at /buildwo"..., 86) = 86
write(2, "jl_set_gc_and_wait at /buildwork"..., 99) = 99
write(2, "segv_handler at /buildworker/wor"..., 81) = 81
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/libpthread.so.0", O_RDONLY|O_CLOEXEC) = 16
fstat(16, {st_mode=S_IFREG|0755, st_size=144976, ...}) = 0
mmap(NULL, 144976, PROT_READ, MAP_PRIVATE, 16, 0) = 0x7f6a24000000
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [INT USR1], 8) = 0
close(16)                               = 0
rt_sigprocmask(SIG_SETMASK, [INT USR1], NULL, 8) = 0
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/f36706eb2e6eee4046c4fdca2a19540b2f6113.debug", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/lib/x86_64-linux-gnu/.debug/f36706eb2e6eee4046c4fdca2a19540b2f6113.debug", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/usr/lib/debug//lib/x86_64-linux-gnu/f36706eb2e6eee4046c4fdca2a19540b2f6113.debug", O_RDONLY|O_CLOEXEC) = -1 ENOENT (No such file or directory)
write(2, "_IO_funlockfile at /lib/x86_64-l"..., 72) = 72
openat(AT_FDCWD, "/home/pharaldsson_sym/julia-1.8-DEV-7553ca13cc/lib/julia/sys.so", O_RDONLY|O_CLOEXEC) = 16
fstat(16, {st_mode=S_IFREG|0755, st_size=197785288, ...}) = 0
mmap(NULL, 197785288, PROT_READ, MAP_PRIVATE, 16, 0) = 0x7f69b4360000
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [INT USR1], 8) = 0
close(16)                               = 0
rt_sigprocmask(SIG_SETMASK, [INT USR1], NULL, 8) = 0
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [INT USR1], 8) = 0
rt_sigprocmask(SIG_SETMASK, [INT USR1], NULL, 8) = 0
write(2, "sigatomic_end at ./c.jl:437 [inl"..., 38) = 38
write(2, "task_done_hook at ./task.jl:529\n", 32) = 32
write(2, "unknown function (ip: (nil))\n", 29) = 29
write(2, "jl_mutex_trylock_nogc at /buildw"..., 92) = 92
write(2, "multiq_deletemin at /buildworker"..., 88) = 88
write(2, "get_next_task at /buildworker/wo"..., 75) = 75
write(2, "jl_task_get_next at /buildworker"..., 78) = 78
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [INT USR1], 8) = 0
rt_sigprocmask(SIG_SETMASK, [INT USR1], NULL, 8) = 0
write(2, "poptask at ./task.jl:800\n", 25) = 25
rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [INT USR1], 8) = 0
rt_sigprocmask(SIG_SETMASK, [INT USR1], NULL, 8) = 0
write(2, "wait at ./task.jl:808 [inlined]\n", 32) = 32
write(2, "task_done_hook at ./task.jl:531\n", 32) = 32
write(2, "_jl_invoke at /buildworker/worke"..., 80) = 80
write(2, "jl_apply_generic at /buildworker"..., 76) = 76
write(2, "jl_apply at /buildworker/worker/"..., 81) = 81
write(2, "jl_finish_task at /buildworker/w"..., 75) = 75
write(2, "start_task at /buildworker/worke"..., 71) = 71
write(2, "unknown function (ip: (nil))\n", 29) = 29
write(2, "jl_gc_wait_for_the_world at /bui"..., 93) = 93
write(2, "jl_gc_collect at /buildworker/wo"..., 73) = 73
write(2, "maybe_collect at /buildworker/wo"..., 82) = 82
write(2, "jl_gc_pool_alloc at /buildworker"..., 76) = 76
write(2, "jl_gc_alloc_ at /buildworker/wor"..., 93) = 93
write(2, "_new_array_ at /buildworker/work"..., 83) = 83
write(2, "_new_array at /buildworker/worke"..., 82) = 82
write(2, "jl_alloc_array_1d at /buildworke"..., 79) = 79
write(2, "Array at ./boot.jl:457 [inlined]"..., 33) = 33
write(2, "generate_chunks at /home/pharald"..., 72) = 72
write(2, "#9 at ./threadingconstructs.jl:1"..., 35) = 35
write(2, "unknown function (ip: 0x7f6a178e"..., 38) = 38
write(2, "_jl_invoke at /buildworker/worke"..., 80) = 80
write(2, "jl_apply_generic at /buildworker"..., 76) = 76
write(2, "jl_apply at /buildworker/worker/"..., 81) = 81
write(2, "start_task at /buildworker/worke"..., 71) = 71
write(2, "unknown function (ip: (nil))\n", 29) = 29
write(2, "jl_safepoint_wait_gc at /buildwo"..., 86) = 86
write(2, "jl_set_gc_and_wait at /buildwork"..., 99) = 99
write(2, "segv_handler at /buildworker/wor"..., 81) = 81
write(2, "_IO_funlockfile at /lib/x86_64-l"..., 72) = 72
write(2, "jl_gc_safepoint at /buildworker/"..., 77) = 77
write(2, "unknown function (ip: (nil))\n", 29) = 29
write(2, "Allocations: 809887 (Pool: 80970"..., 52) = 52
--- SIGSEGV {si_signo=SIGSEGV, si_code=SEGV_ACCERR, si_addr=0x7f6a3e8d0000} ---

Just after, I could run with out hyperfine and then:

$ time ~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O2 fasta.jl 25000000 > /dev/null

real	0m2,725s
user	0m7,362s
sys	0m0,827s

I doubt this is a hyperfine problem, as then it worked fine. Still I can't rule it out as I've only seen the problem with it. Is it a likely interaction? Since it seems to just run the program in a shell? Actually just trying that seems might be the problem as this just hung:

$ time sh -c ~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O2 fasta.jl 25000000 > /dev/null

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Jun 30, 2021

#41411 likely fixes threading hang (untested), but I tested commenting out printing ("write" and "unsafe_write") statements, and fasta.jl went from (min) 2.793 sec to (min) 2.401 sec. or 14% reduction. I took out some wait statements (not all) and time went further down to (min) 2.207 sec, and I still got hangs, but if I take out the last two waits, i.e. the following lines, the time went down to (min) 497.0 ms (82% reduction), but I think the program exited before the threads had completed running the code.

rtask === nothing || wait(rtask)

and

wait(io3)

At some point after, i.e. I think I had eliminated all waits (but I'm not sure, I was testing back and forth) I did get:

$ hyperfine '~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O2 fasta_wo_io.jl 25000000 > /dev/null'
Benchmark #1: ~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O2 fasta_wo_io.jl 25000000 > /dev/null
 ⠧ Current estimate: 1.476 s      ███████████████████████████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ETA 00:00:10
Error: Command terminated with non-zero exit code. Use the '-i'/'--ignore-failure' option if you want to ignore this. Alternatively, use the '--show-output' option to debug what went wrong.

$ hyperfine '~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O2 fasta_wo_io.jl 25000000 > /dev/null'
Benchmark #1: ~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O2 fasta_wo_io.jl 25000000 > /dev/null
  Time (mean ± σ):      1.569 s ±  0.169 s    [User: 1.144 s, System: 0.244 s]
  Range (min … max):    1.391 s …  1.861 s    10 runs

then julia processes were left running and load went up to 30, and I started seening "Measuring shell spawning time" and progress bar for long before "Initial time measurement" and high measurements:

$ hyperfine '~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O2 fasta_wo_io.jl 25000000 > /dev/null'
Benchmark #1: ~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O2 fasta_wo_io.jl 25000000 > /dev/null
 ⠋ Current estimate: 5.828 s      ███████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░ ETA 00:00:53

@PallHaraldsson
Copy link
Contributor Author

@vtjnash I did try to uncomment back the write statements (continuing from previous comment), while keeping the wait statements I had commented out, still commented out, and I predictable get the output in some garbled order, but at least it's fast... so the I/O isn't strictly slow, nor seems the cause of the hang, only the lock contention. And it seems to have eliminated the hang.

$ hyperfine '~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O2 fasta_wo_io.jl 25000000 > /dev/null'
Benchmark #1: ~/julia-1.8-DEV-7553ca13cc/bin/julia --startup-file=no -t4 -O2 fasta_wo_io.jl 25000000 > /dev/null
  Time (mean ± σ):     665.7 ms ± 100.4 ms    [User: 1.480 s, System: 0.554 s]
  Range (min … max):   545.0 ms … 814.5 ms    10 runs

@PallHaraldsson PallHaraldsson changed the title I/O (with parallelism) slower in Julia 1.6.0 and 1.7 More lock contention in Julia 1.6 to 1.8 than in 1.5 (I/O with parallelism strictly not the reason) Jun 30, 2021
@PallHaraldsson PallHaraldsson changed the title More lock contention in Julia 1.6 to 1.8 than in 1.5 (I/O with parallelism strictly not the reason) More lock contention in Julia 1.6 to 1.8 than in 1.5.1 (I/O with parallelism strictly not the reason) Jun 30, 2021
@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Sep 15, 2021

I just downloaded a nightly and it seems to be getting worse, i.e. "user" time, while "real" maybe slightly better:

$ time ~/Downloads/julia-f8d3bd22f6/bin/julia --startup-file=no -p1 -e0

real	0m8,929s
user	0m14,464s
sys	0m1,242s

$ time ~/julia-1.7.0-beta3-e76c9dad42/bin/julia --startup-file=no -p1 -e0

real	0m9,206s
user	0m9,181s
sys	0m0,605s

@PallHaraldsson
Copy link
Contributor Author

PallHaraldsson commented Oct 22, 2021

The "real" time is getting a little better on the latest master, while with --compile=min much better:

$ time ~/julia-1.8-DEV-991d6e6cb8/bin/julia --startup-file=no -p1 -e0

real	0m7.156s
user	0m12.328s
sys	0m2.226s

$ time ~/julia-1.8-DEV-991d6e6cb8/bin/julia --startup-file=no -p1 --compile=min -e0

real	0m2.637s
user	0m4.254s
sys	0m2.207s


$ time ~/julia-1.7.0-rc1/bin/julia --startup-file=no -p1 -e0

real	0m7.255s
user	0m11.906s
sys	0m1.157s

$ time ~/julia-1.7.0-rc1/bin/julia --startup-file=no -p1 --compile=min -e0

real	0m2.690s
user	0m3.560s
sys	0m1.156s


$ time ~/julia-1.7.0-beta3-e76c9dad42/bin/julia --startup-file=no -p1 -e0

real	0m7.407s
user	0m7.548s
sys	0m0.521s

$ time ~/julia-1.7.0-beta3-e76c9dad42/bin/julia --startup-file=no -p1 --compile=min -e0

real	0m2.463s
user	0m3.675s
sys	0m0.921s


$ time ~/julia-1.8-DEV-32-bit-609a4a0a1e/bin/julia --startup-file=no -p1 -e0

real	0m7.964s
user	0m13.176s
sys	0m1.168s

$ time ~/julia-1.8-DEV-32-bit-609a4a0a1e/bin/julia --startup-file=no -p1 --compile=min -e0

real	0m2.597s
user	0m3.666s
sys	0m1.053s

@vtjnash
Copy link
Sponsor Member

vtjnash commented Aug 24, 2023

I think this is in our nanosoldier basebenchmarks, so we will notice any changes.

@vtjnash vtjnash closed this as completed Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind:regression Regression in behavior compared to a previous version performance Must go faster
Projects
None yet
Development

No branches or pull requests

5 participants