Improve exception output #2342

maleadt · 2024-04-24T13:29:31Z

This PR expands the simple exception flag to an info struct that can contain much more information.
I then use it to:

add an output lock to ensure only one exception is reported by only a single thread
store the exception type and reason forwarded from overlay quirks

Fixes #1780, #2341, significantly improving the output.

Before:

julia> using CUDA

julia> a = cu([1])
1-element CuArray{Int64, 1, CUDA.DeviceMemory}:
 1

julia> kernel(a) = (a[threadIdx().x]; nothing)
kernel (generic function with 1 method)

julia> @cuda threads=3 kernel(a)
CUDA.HostKernel for kernel(CuDeviceVector{Int64, 1})

julia> ERROR: Out-of-bounds array access.
ERROR: Out-of-bounds array access.
ERROR: a exception was thrown during kernel execution.
Stacktrace:
ERROR: a exception was thrown during kernel execution.
Stacktrace:
 [1] throw_boundserror at /home/tim/Julia/pkg/CUDA/src/device/quirks.jl:4
 [1] throw_boundserror at /home/tim/Julia/pkg/CUDA/src/device/quirks.jl:4
 [2] #throw_boundserror at /home/tim/Julia/pkg/CUDA/src/device/quirks.jl:42
 [2] #throw_boundserror at /home/tim/Julia/pkg/CUDA/src/device/quirks.jl:42
 [3] checkbounds at ./abstractarray.jl:702
 [3] checkbounds at ./abstractarray.jl:702
 [4] #arrayref at /home/tim/Julia/pkg/CUDA/src/device/array.jl:81
 [4] #arrayref at /home/tim/Julia/pkg/CUDA/src/device/array.jl:81
 [5] getindex at /home/tim/Julia/pkg/CUDA/src/device/array.jl:164
 [5] getindex at /home/tim/Julia/pkg/CUDA/src/device/array.jl:164
 [6] kernel at ./REPL[3]:1
 [6] kernel at ./REPL[3]:1

After:

julia> @cuda threads=3 kernel(a)
CUDA.HostKernel for kernel(CuDeviceVector{Int64, 1})

julia> ERROR: a BoundsError was thrown during kernel execution on thread (2, 1, 1) in block (1, 1, 1).
Out-of-bounds array access
Stacktrace:
 [1] throw_boundserror at /home/tim/Julia/pkg/CUDA/src/device/quirks.jl:15
 [2] #throw_boundserror at /home/tim/Julia/pkg/CUDA/src/device/quirks.jl:53
 [3] checkbounds at ./abstractarray.jl:702
 [4] #arrayref at /home/tim/Julia/pkg/CUDA/src/device/array.jl:81
 [5] getindex at /home/tim/Julia/pkg/CUDA/src/device/array.jl:164
 [6] kernel at ./REPL[4]:1

Note that this still doesn't cover all exception generating sites though, e.g., if code does throw(BoundsError()) and we don't have a quirk that provides additional exception information, we still report a simple Exception. An IR-level transformation to recover that info would be great, but we currently don't have the tooling for that in GPUCompiler.

codecov · 2024-04-25T09:27:46Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.85%. Comparing base (eb45b2c) to head (b658315).
Report is 3 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2342      +/-   ##
==========================================
- Coverage   71.86%   71.85%   -0.02%     
==========================================
  Files         155      155              
  Lines       15074    15072       -2     
==========================================
- Hits        10833    10830       -3     
- Misses       4241     4242       +1

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Use a lock to only report exception information once.

4be1e8a

maleadt marked this pull request as ready for review April 24, 2024 15:26

Forward exception type and reason from quirk overlays to runtime.

b658315

maleadt force-pushed the tb/exception_output branch from d39f639 to b658315 Compare April 24, 2024 15:26

maleadt mentioned this pull request Apr 24, 2024

Improve error message when assigning real valued arrray with complex numbers #2341

Closed

maleadt added enhancement New feature or request cuda kernels Stuff about writing CUDA kernels. labels Apr 25, 2024

maleadt mentioned this pull request Apr 25, 2024

Capture exception information JuliaGPU/GPUCompiler.jl#574

Open

maleadt merged commit 5dd6bb2 into master Apr 25, 2024
1 check passed

maleadt deleted the tb/exception_output branch April 25, 2024 09:37

maleadt linked an issue Apr 25, 2024 that may be closed by this pull request

Improve error message when assigning real valued arrray with complex numbers #2341

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve exception output #2342

Improve exception output #2342

maleadt commented Apr 24, 2024 •

edited

Loading

codecov bot commented Apr 25, 2024

Improve exception output #2342

Improve exception output #2342

Conversation

maleadt commented Apr 24, 2024 • edited Loading

codecov bot commented Apr 25, 2024

Codecov Report

maleadt commented Apr 24, 2024 •

edited

Loading