Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make Ref mutable on the GPU. #2109

Merged
merged 5 commits into from
Nov 7, 2023
Merged

Make Ref mutable on the GPU. #2109

merged 5 commits into from
Nov 7, 2023

Conversation

maleadt
Copy link
Member

@maleadt maleadt commented Oct 4, 2023

As requested by @utkarsh530:

julia> x = Ref(0)
Base.RefValue{Int64}(0)

julia> function kernel(ref)
           ref[] = threadIdx().x
           return
       end
kernel (generic function with 1 method)

julia> @cuda kernel(x)
CUDA.HostKernel for kernel(CUDA.CuRefValue{Int64})

julia> x
Base.RefValue{Int64}(1)

I'm not convinced we want this, because it requires additional API operations when launching a kernel (to pin the memory). Right now, we are essentially passing the Ref as a Tuple, which doesn’t require any API operations, but makes it immutable of course. It's for the same reason that we don't support nested containers.

Fixes #267

@maleadt maleadt added enhancement New feature or request cuda kernels Stuff about writing CUDA kernels. labels Oct 4, 2023
@maleadt
Copy link
Member Author

maleadt commented Nov 6, 2023

Looking into the failures, the problem is that Ref is also used in the context of broadcast as e.g. RefValue{CuArray}, which has to be materialized as RefValue{CuDeviceArray}. That however breaks the ability to replace the Ref's own memory...

So we need to do something different based on the assumed user's intent. Either we differentiate based on isbits or not, or we only do the Ref content conversion in the context of broadcast...

Copy link

codecov bot commented Nov 7, 2023

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (6a8293b) 72.34% compared to head (c062cfd) 60.49%.

❗ Current head c062cfd differs from pull request most recent head f0d00cd. Consider uploading reports for the commit f0d00cd to get more accurate results

Additional details and impacted files
@@             Coverage Diff             @@
##           master    #2109       +/-   ##
===========================================
- Coverage   72.34%   60.49%   -11.85%     
===========================================
  Files         159      154        -5     
  Lines       14513    14043      -470     
===========================================
- Hits        10500     8496     -2004     
- Misses       4013     5547     +1534     
Files Coverage Δ
src/compiler/execution.jl 87.07% <86.66%> (-0.34%) ⬇️
lib/cudadrv/memory.jl 77.37% <0.00%> (-1.82%) ⬇️

... and 57 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@maleadt maleadt marked this pull request as ready for review November 7, 2023 09:54
@maleadt maleadt merged commit 0f3313d into master Nov 7, 2023
1 check was pending
@maleadt maleadt deleted the tb/ref branch November 7, 2023 11:25
maleadt added a commit that referenced this pull request Dec 21, 2023
Turns out broadcast uses ephemeral Ref boxes to pass scalars,
which get freed rapidly and can lead to illegal memory accesses
when the broadcast kernel accesses them.
@maleadt maleadt mentioned this pull request Dec 21, 2023
kshyatt pushed a commit that referenced this pull request Dec 27, 2023
Turns out broadcast uses ephemeral Ref boxes to pass scalars,
which get freed rapidly and can lead to illegal memory accesses
when the broadcast kernel accesses them.
maleadt added a commit that referenced this pull request Jan 6, 2024
Turns out broadcast uses ephemeral Ref boxes to pass scalars,
which get freed rapidly and can lead to illegal memory accesses
when the broadcast kernel accesses them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda kernels Stuff about writing CUDA kernels. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make Ref pass by-reference
1 participant