RFC: a safepoint in yield #40473

tkf · 2021-04-14T00:38:08Z

Currently, to avoid deadlocks, we need to manually insert GC.safepoint() in non-allocating tasks and also yield if multiple tasks are using low-level concurrency communication primitives such as atomics. I find it confusing that yield() does not imply GC.safepoint() and so you have to write both. This is because my mental model of yield is to "let julia runtime do whatever it needs to do at the moment" and GC is a part of julia runtime. Since it is possible to write non-allocating concurrent tasks, maybe it makes sense to invoke GC.safepoint() at the locations that the programmers marking that it's OK to do context switch? As yield() is more than 100 times expensive than GC.safepoint(), I think the performance implication is negligible.

More specifically, this patch adds a single unconditional safepoint in jl_process_events.

MWE

An easy way to observe this is a very naive spin lock. (But this PR is not for advocating spin lock.)

core_flush(io) = ccall(:jl_uv_flush, Cvoid, (Ptr{Cvoid},), Core.io_pointer(io))

function mwe(; with_yield = true, with_safepoint = true)
    @assert Threads.nthreads() > 1
    println()
    println("with_safepoint = $with_safepoint with_yield = $with_yield")

    scheduled = Threads.Atomic{Bool}(false)
    started = Threads.Atomic{Bool}(false)
    done = Threads.Atomic{Bool}(false)
    n = Threads.nthreads(),
    @sync try
        Threads.@threads :static for _ in 1:Threads.nthreads()
            Threads.threadid() == 1 && continue
            Threads.@async begin
                if !Threads.atomic_xchg!(scheduled, true)
                    started[] = true
                    Core.print("$(Threads.threadid()): spinning...\n")
                    while !done[]
                        with_safepoint && GC.safepoint()
                        with_yield && yield()
                    end
                end
                Core.print("$(Threads.threadid()): DONE\n")
            end
        end
        while !started[]
            GC.safepoint()
            yield()
        end
        Core.print("GC.gc()...\n")
        core_flush(Core.stdout)
        GC.gc()
    catch err
        @error "root" exception = (err, catch_backtrace())
        rethrow()
    finally
        Core.print("stopping...\n")
        done[] = true
    end
end

mwe()
mwe(with_yield = false)
mwe(with_safepoint = false)

Running this script with 2 threads prints

with_safepoint = true with_yield = true
GC.gc()...
2: spinning...
stopping...
2: DONE

with_safepoint = true with_yield = false
2: spinning...
GC.gc()...
stopping...
2: DONE

with_safepoint = false with_yield = true
GC.gc()...
2: spinning...

and hangs. That is to say, yield alone is not enough to trigger the safepoint. This patch fixes the deadlock.

Benchmarking `yield` and `GC.safepoint`

julia> @btime yield()
  435.854 ns (0 allocations: 0 bytes)

julia> @btime GC.safepoint()
  1.809 ns (0 allocations: 0 bytes)

Invoking them in (more or less) parallel does not change the rough estimate:

julia> function foreach_thread(f, n = Threads.nthreads())
           ys = Vector{Any}(undef, n)
           Threads.@threads :static for i in 1:n
               ys[i] = f(i)
           end
           return ys
       end
foreach_thread (generic function with 2 methods)

julia> by = foreach_thread() do _
           @benchmark yield()
       end
8-element Vector{Any}:
 Trial(426.213 ns)
 Trial(460.337 ns)
 Trial(962.747 ns)
 Trial(604.731 ns)
 Trial(600.040 ns)
 Trial(596.795 ns)
 Trial(227.000 ns)
 Trial(963.750 ns)

julia> bs = foreach_thread() do _
           @benchmark GC.safepoint()
       end
8-element Vector{Any}:
 Trial(1.809 ns)
 Trial(1.809 ns)
 Trial(1.809 ns)
 Trial(1.809 ns)
 Trial(1.509 ns)
 Trial(1.509 ns)
 Trial(1.509 ns)
 Trial(1.509 ns)

StefanKarpinski · 2021-04-14T01:27:57Z

Very naive opinion, this seems like a good idea to me.

vtjnash

Yeah, this seems like a reasonable place to put it for now

yield is already a potential safepoint, this just ensures it is always one

Always trigger a safepoint in jl_process_events

38c881b

tkf requested a review from vtjnash April 14, 2021 00:38

tkf added the domain:multithreading Base.Threads and related functionality label Apr 14, 2021

vtjnash reviewed Apr 14, 2021

View reviewed changes

vtjnash merged commit 7112c89 into JuliaLang:master Apr 14, 2021

tkf deleted the yield-safepoint branch April 14, 2021 06:09

ElOceanografo pushed a commit to ElOceanografo/julia that referenced this pull request May 4, 2021

Always trigger a safepoint in process_events (JuliaLang#40473)

fead210

yield is already a potential safepoint, this just ensures it is always one

antoine-levitt pushed a commit to antoine-levitt/julia that referenced this pull request May 9, 2021

Always trigger a safepoint in process_events (JuliaLang#40473)

57b9f7a

yield is already a potential safepoint, this just ensures it is always one

johanmon pushed a commit to johanmon/julia that referenced this pull request Jul 5, 2021

Always trigger a safepoint in process_events (JuliaLang#40473)

be8833a

yield is already a potential safepoint, this just ensures it is always one

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: a safepoint in yield #40473

RFC: a safepoint in yield #40473

tkf commented Apr 14, 2021

StefanKarpinski commented Apr 14, 2021

vtjnash left a comment

RFC: a safepoint in yield #40473

RFC: a safepoint in yield #40473

Conversation

tkf commented Apr 14, 2021

MWE

Benchmarking yield and GC.safepoint

StefanKarpinski commented Apr 14, 2021

vtjnash left a comment

Choose a reason for hiding this comment

Benchmarking `yield` and `GC.safepoint`