Make Ref mutable on the GPU. #2109

maleadt · 2023-10-04T11:47:09Z

As requested by @utkarsh530:

julia> x = Ref(0)
Base.RefValue{Int64}(0)

julia> function kernel(ref)
           ref[] = threadIdx().x
           return
       end
kernel (generic function with 1 method)

julia> @cuda kernel(x)
CUDA.HostKernel for kernel(CUDA.CuRefValue{Int64})

julia> x
Base.RefValue{Int64}(1)

I'm not convinced we want this, because it requires additional API operations when launching a kernel (to pin the memory). Right now, we are essentially passing the Ref as a Tuple, which doesn’t require any API operations, but makes it immutable of course. It's for the same reason that we don't support nested containers.

Fixes #267

maleadt · 2023-11-06T20:26:45Z

Looking into the failures, the problem is that Ref is also used in the context of broadcast as e.g. RefValue{CuArray}, which has to be materialized as RefValue{CuDeviceArray}. That however breaks the ability to replace the Ref's own memory...

So we need to do something different based on the assumed user's intent. Either we differentiate based on isbits or not, or we only do the Ref content conversion in the context of broadcast...

codecov · 2023-11-07T09:32:35Z

Codecov Report

Attention: 9 lines in your changes are missing coverage. Please review.

Comparison is base (6a8293b) 72.34% compared to head (c062cfd) 60.49%.

❗ Current head c062cfd differs from pull request most recent head f0d00cd. Consider uploading reports for the commit f0d00cd to get more accurate results

Additional details and impacted files

@@             Coverage Diff             @@
##           master    #2109       +/-   ##
===========================================
- Coverage   72.34%   60.49%   -11.85%     
===========================================
  Files         159      154        -5     
  Lines       14513    14043      -470     
===========================================
- Hits        10500     8496     -2004     
- Misses       4013     5547     +1534

Files	Coverage Δ
src/compiler/execution.jl	`87.07% <86.66%> (-0.34%)`	⬇️
lib/cudadrv/memory.jl	`77.37% <0.00%> (-1.82%)`	⬇️

... and 57 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Turns out broadcast uses ephemeral Ref boxes to pass scalars, which get freed rapidly and can lead to illegal memory accesses when the broadcast kernel accesses them.

maleadt added enhancement New feature or request cuda kernels Stuff about writing CUDA kernels. labels Oct 4, 2023

maleadt force-pushed the tb/ref branch from bbd471a to a167dd2 Compare November 6, 2023 20:43

maleadt force-pushed the master branch from 0dcc0e6 to 6a8293b Compare November 7, 2023 07:04

maleadt added 3 commits November 7, 2023 09:45

Make Ref mutable on the GPU.

12de870

Split implementation.

bf8426f

Support HMM, and skip singletons.

c062cfd

maleadt force-pushed the tb/ref branch from a167dd2 to c062cfd Compare November 7, 2023 08:56

maleadt marked this pull request as ready for review November 7, 2023 09:54

maleadt added 2 commits November 7, 2023 10:58

Add docs.

ada852f

Add test.

f0d00cd

maleadt merged commit 0f3313d into master Nov 7, 2023
1 check was pending

maleadt deleted the tb/ref branch November 7, 2023 11:25

maleadt mentioned this pull request Dec 21, 2023

Pullback on mean() gives illegal memory access code 700 FluxML/Zygote.jl#1473

Closed

maleadt added a commit that referenced this pull request Dec 21, 2023

Revert #2109.

a1a72c7

Turns out broadcast uses ephemeral Ref boxes to pass scalars, which get freed rapidly and can lead to illegal memory accesses when the broadcast kernel accesses them.

maleadt mentioned this pull request Dec 21, 2023

Fixes for Windows #2206

Merged

kshyatt pushed a commit that referenced this pull request Dec 27, 2023

Revert #2109.

83b0140

Turns out broadcast uses ephemeral Ref boxes to pass scalars, which get freed rapidly and can lead to illegal memory accesses when the broadcast kernel accesses them.

maleadt added a commit that referenced this pull request Jan 6, 2024

Revert #2109.

803f1fb

Turns out broadcast uses ephemeral Ref boxes to pass scalars, which get freed rapidly and can lead to illegal memory accesses when the broadcast kernel accesses them.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make Ref mutable on the GPU. #2109

Make Ref mutable on the GPU. #2109

maleadt commented Oct 4, 2023 •

edited

Loading

maleadt commented Nov 6, 2023

codecov bot commented Nov 7, 2023 •

edited

Loading

Make Ref mutable on the GPU. #2109

Make Ref mutable on the GPU. #2109

Conversation

maleadt commented Oct 4, 2023 • edited Loading

maleadt commented Nov 6, 2023

codecov bot commented Nov 7, 2023 • edited Loading

Codecov Report

maleadt commented Oct 4, 2023 •

edited

Loading

codecov bot commented Nov 7, 2023 •

edited

Loading