Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track array ownership to avoid illegal memory accesses #763

Closed
marius311 opened this issue Mar 12, 2021 · 5 comments
Closed

Track array ownership to avoid illegal memory accesses #763

marius311 opened this issue Mar 12, 2021 · 5 comments
Labels
enhancement New feature or request

Comments

@marius311
Copy link
Contributor

This is one thing that I think would greatly improve the interactive single process multi-GPU workflow. Right now if you accidentally trigger an illegal memory access (like say you just forgot that some variable in your session isn't on the GPU you currently have active), then it borks the whole session and you have to restart:

julia> using CUDA

julia> device!(0)

julia> x = cu(rand(2,2))
2×2 CuArray{Float32, 2}:
 0.911817  0.814058
 0.579863  0.511812

julia> device!(1)

julia> 2 .* x  # oops forgot x is on device 0
2×2 CuArray{Float32, 2}:
Error showing value of type CuArray{Float32, 2}:
ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:

# have to restart session now, all other (otherwise valid) GPU operations now throw illegal memory access
@marius311 marius311 added the enhancement New feature or request label Mar 12, 2021
@maleadt
Copy link
Member

maleadt commented Mar 12, 2021

That's a CUDA limitation, nothing we can do about it. File it with NVIDIA instead 😄

@maleadt
Copy link
Member

maleadt commented Mar 12, 2021

Of course, we shouldn't be running into illegal memory accesses at all, CUDA.jl should be as safe to use as possible. In this case, we should probably be tracking which device owns an array.

@maleadt maleadt changed the title Prevent illegal memory accesses from borking session Track array ownership to avoid illegal memory accesses Mar 12, 2021
@marius311
Copy link
Contributor Author

marius311 commented Mar 12, 2021

Yea, I think just doing a little check on the CUDA.jl size would be pretty useful. I suppose this is already tracked, right? Far from the cleanest, but

findfirst(==(x.ctx), CUDA.__device_contexts)-1

does give you the device id that x::CuArray was created on.

@maleadt
Copy link
Member

maleadt commented Mar 31, 2021

Note to self: it might be an idea to track the context in the buffer and disallow conversion to a pointer if the current context doesn't match the buffer's.

@maleadt
Copy link
Member

maleadt commented Apr 27, 2024

This is implemented now.

@maleadt maleadt closed this as completed Apr 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants