Track array ownership to avoid illegal memory accesses #763

marius311 · 2021-03-12T10:22:49Z

This is one thing that I think would greatly improve the interactive single process multi-GPU workflow. Right now if you accidentally trigger an illegal memory access (like say you just forgot that some variable in your session isn't on the GPU you currently have active), then it borks the whole session and you have to restart:

julia> using CUDA

julia> device!(0)

julia> x = cu(rand(2,2))
2×2 CuArray{Float32, 2}:
 0.911817  0.814058
 0.579863  0.511812

julia> device!(1)

julia> 2 .* x  # oops forgot x is on device 0
2×2 CuArray{Float32, 2}:
Error showing value of type CuArray{Float32, 2}:
ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:

# have to restart session now, all other (otherwise valid) GPU operations now throw illegal memory access

maleadt · 2021-03-12T11:50:39Z

That's a CUDA limitation, nothing we can do about it. File it with NVIDIA instead 😄

maleadt · 2021-03-12T11:54:49Z

Of course, we shouldn't be running into illegal memory accesses at all, CUDA.jl should be as safe to use as possible. In this case, we should probably be tracking which device owns an array.

marius311 · 2021-03-12T21:04:34Z

Yea, I think just doing a little check on the CUDA.jl size would be pretty useful. I suppose this is already tracked, right? Far from the cleanest, but

findfirst(==(x.ctx), CUDA.__device_contexts)-1

does give you the device id that x::CuArray was created on.

maleadt · 2021-03-31T09:24:58Z

Note to self: it might be an idea to track the context in the buffer and disallow conversion to a pointer if the current context doesn't match the buffer's.

maleadt · 2024-04-27T17:43:20Z

This is implemented now.

marius311 added the enhancement New feature or request label Mar 12, 2021

maleadt changed the title ~~Prevent illegal memory accesses from borking session~~ Track array ownership to avoid illegal memory accesses Mar 12, 2021

maleadt closed this as completed Apr 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track array ownership to avoid illegal memory accesses #763

Track array ownership to avoid illegal memory accesses #763

marius311 commented Mar 12, 2021

maleadt commented Mar 12, 2021

maleadt commented Mar 12, 2021

marius311 commented Mar 12, 2021 •

edited

Loading

maleadt commented Mar 31, 2021

maleadt commented Apr 27, 2024

Track array ownership to avoid illegal memory accesses #763

Track array ownership to avoid illegal memory accesses #763

Comments

marius311 commented Mar 12, 2021

maleadt commented Mar 12, 2021

maleadt commented Mar 12, 2021

marius311 commented Mar 12, 2021 • edited Loading

maleadt commented Mar 31, 2021

maleadt commented Apr 27, 2024

marius311 commented Mar 12, 2021 •

edited

Loading