Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minor improvements to nonblocking synchronization. #2272

Merged
merged 1 commit into from
Feb 22, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 18 additions & 7 deletions lib/cudadrv/synchronization.jl
Original file line number Diff line number Diff line change
Expand Up @@ -110,6 +110,7 @@ const SyncObject = Union{CuContext, CuStream, CuEvent}
const MAX_SYNC_THREADS = 4
const sync_channels = Array{BidirectionalChannel{SyncObject,CUresult}}(undef, MAX_SYNC_THREADS)
const sync_channel_cursor = Threads.Atomic{UInt32}(1)
const sync_channel_lock = Base.ReentrantLock()

function synchronization_worker(data)
i = Int(data)
Expand All @@ -133,15 +134,25 @@ function synchronization_worker(data)
end

@noinline function create_synchronization_worker(i)
sync_channels[i] = BidirectionalChannel{SyncObject,CUresult}()
# should be safe to assign before threads are running;
# any user will just submit work that makes it block
lock(sync_channel_lock) do
# test and test-and-set
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TATAS doesn't look safe here to me, since BidirectionalChannel is a struct, not a pointer, so it doesn't have atomic updates in an array. Maybe make it mutable for now? This is the reason for JuliaLang/julia#51495, but we don't have that merged yet. It could also now be an AtomicMemory object, but (a) that is only in v1.11 (b) it doesn't have the public API written yet (c) would incur a lock allocation for every element and every read or write access, which seems undesired here

if isassigned(sync_channels, i)
return
end

# should be safe to assign before threads are running;
# any user will just submit work that makes it block
sync_channels[i] = BidirectionalChannel{SyncObject,CUresult}()

# we don't know what the size of uv_thread_t is, so reserve enough space
tid = Ref{NTuple{32, UInt8}}(ntuple(i -> 0, 32))
# we don't know what the size of uv_thread_t is, so reserve enough space
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, uv_thead_t is a Csize_t AFAIK, and requires alignment as such, which this does not have

tid = Ref{NTuple{32, UInt8}}(ntuple(i -> 0, 32))

cb = @cfunction(synchronization_worker, Cvoid, (Ptr{Cvoid},))
@ccall uv_thread_create(tid::Ptr{Cvoid}, cb::Ptr{Cvoid}, Ptr{Cvoid}(i)::Ptr{Cvoid})::Int32
cb = @cfunction(synchronization_worker, Cvoid, (Ptr{Cvoid},))
err = @ccall uv_thread_create(tid::Ptr{Cvoid}, cb::Ptr{Cvoid}, Ptr{Cvoid}(i)::Ptr{Cvoid})::Cint
err == 0 || Base.uv_error("uv_thread_create", err)
@ccall uv_thread_detach(tid::Ptr{Cvoid})::Cint
err == 0 || Base.uv_error("uv_thread_detach", err)
end

return
end
Expand Down