Rework context handling #2346

maleadt · 2024-04-25T09:48:15Z

Problem

CUDA contexts are annoying:

references to identical contexts can be constructed independently through different APIs (cuCtxCreate, cuCtxGetCurrent, etc)
destroying a context means that all resources allocated in that context are now invalid, and cannot be used in any API call
after destroying a context, creating a new one may result in the same handle being reused
Julia can destroy objects out-of-order, e.g., first the CuContext, then a CuStream, even though the stream object had a reference to the context

All this significantly complicates our ability to determine whether objects are safe to use and/or need to be finalized. Currently, we solve this by using some kind of factory method that is guaranteed to return a unique context object for every session of a context handle (i.e., after handle destruction and recreation, this method returns a different object despite the handle being identical). In combination with targeted invalidation of that object from all known APIs that destroy a context, that makes it possible to automatically determine context validity in all derived objects storing a reference. All this relies on multiple global dictionaries, which are slow and fragile (and have resulted in several issues with use with threads, and in finalizers where we can't take locks to safely access that global dict). It also isn't guaranteed to be correct, especially when cooperating with other software that may call context APIs.

Solution

CUDA 12.0 provides a new driver API, cuCtxGetId, which returns a monotonically incrementing identifier that does change when a context is destroyed and re-allocated. This greatly simplifies the design:

we no longer need a single unique CuContext object, as we can uniquely identify the object by its identifier
we can simply check validity by ensuring we can fetch a context's ID, and that the ID matches what we stored at construction time

This makes it possible to demote CuContext to a simple immutable type, and get rid of all context-related global state, improving thread- and finalizer-safety, while making it much cheaper to store context objects in derived resources.

The flip side: CUDA.jl will require a CUDA 12.x-compatible driver. This seems acceptable to me, given the improvements in this PR and the fact that CUDA 12 has been out for quite a while. People relying on CUDA 11.x can always keep using CUDA.jl 5.x. If needed, we can even make additional releases of CUDA.jl 5.x if backport PRs are suggested.

cc @vchuravy

This makes it cheaper to look up the origin device, even when the context may have been destroyed.

Even after the context has been destroyed.

It doesn't safe much memory. Also test that resetting the context actually frees it.

codecov · 2024-04-25T12:21:25Z

Codecov Report

Attention: Patch coverage is 79.24528% with 11 lines in your changes are missing coverage. Please review.

Project coverage is 60.33%. Comparing base (5dd6bb2) to head (a534c10).

❗ Current head a534c10 differs from pull request most recent head 51fb828. Consider uploading reports for the commit 51fb828 to get more accurate results

Files	Patch %	Lines
lib/cudadrv/stream.jl	42.85%	4 Missing ⚠️
src/memory.jl	70.00%	3 Missing ⚠️
lib/cudadrv/context.jl	92.85%	2 Missing ⚠️
lib/utils/call.jl	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2346      +/-   ##
==========================================
- Coverage   62.09%   60.33%   -1.77%     
==========================================
  Files         155      155              
  Lines       14965    14926      -39     
==========================================
- Hits         9293     9005     -288     
- Misses       5672     5921     +249

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

maleadt · 2024-04-25T13:14:31Z

test/core/initialization.jl

+@test @allocated(current_context()) == 0
+@test @allocated(context()) == 0
+@test @allocated(stream()) == 0
+@test @allocated(device()) == 0


Sorry @KristofferC...

Straight to blacklist ;)

lib/cudadrv/context.jl

maleadt · 2024-04-25T16:14:13Z

Alternatively, maybe I should just get rid of the ability to reset contexts; this functionality isn't even handled properly by NVIDIA's own libraries...

maleadt · 2024-04-26T06:18:35Z

Or, even better, we could just only support resetting contexts on 12+. That should make it possible to keep compatibility with older drivers, as long as people don't reset the device.

maleadt · 2024-04-26T11:28:52Z

Alright, CUDA 11.x support is back. Let's merge this once CI is green.

[skip julia]

[skip tests] [skip benchmarks]

maleadt added enhancement New feature or request cuda libraries Stuff about CUDA library wrappers. performance How fast can we go? labels Apr 25, 2024

maleadt added 9 commits April 25, 2024 11:48

Store the device in the context object.

9411af3

This makes it cheaper to look up the origin device, even when the context may have been destroyed.

Always free stream-ordered allocations.

fd78f04

Even after the context has been destroyed.

Don't reset the context at the end of every test.

287e571

It doesn't safe much memory. Also test that resetting the context actually frees it.

Rework context handling based on cuCtxGetId.

08aabcf

Improve Ref rendering in debug_ccall.

256ac3f

Re-enable primary context tests.

9991ec8

Keep track of the primary device in DeviceMemory.

426d61d

Avoid the global BitArray by comparing IDs.

7b69151

Ignore cuCtxGetId APIs during profiling.

bac215b

maleadt force-pushed the tb/context branch from 52be683 to bac215b Compare April 25, 2024 09:48

maleadt added 2 commits April 25, 2024 13:33

Fix help message.

1ce8e35

Handle MIG devices in tests correctly.

a534c10

maleadt marked this pull request as ready for review April 25, 2024 13:13

Ensure essential function calls don't allocate.

51fb828

maleadt commented Apr 25, 2024

View reviewed changes

One more @inline will fix it...

e49197b

vchuravy reviewed Apr 25, 2024

View reviewed changes

lib/cudadrv/context.jl Outdated Show resolved Hide resolved

maleadt added 2 commits April 25, 2024 16:31

Use UInt IDs.

854f412

Clean-ups.

f72a43a

maleadt force-pushed the tb/context branch from 9c63c7a to f72a43a Compare April 25, 2024 15:01

maleadt added 4 commits April 26, 2024 10:17

Add unique_id/api_version wrappers.

bab7f4a

Fix context construction.

432bd50

Cache driver version for speed.

6279130

Only support device resets on CUDA 12+.

3352f6a

maleadt force-pushed the tb/context branch from ad565da to a1de692 Compare April 26, 2024 12:42

Correctly handle callbacks from newer drivers during profiling.

98009be

[skip julia]

maleadt force-pushed the tb/context branch from a1de692 to 98009be Compare April 26, 2024 13:21

Fix docs.

a04afe0

[skip tests] [skip benchmarks]

maleadt merged commit 752571b into master Apr 26, 2024
1 check passed

maleadt deleted the tb/context branch April 26, 2024 14:28

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rework context handling #2346

Rework context handling #2346

maleadt commented Apr 25, 2024

codecov bot commented Apr 25, 2024 •

edited

Loading

maleadt Apr 25, 2024

KristofferC Apr 25, 2024

maleadt commented Apr 25, 2024

maleadt commented Apr 26, 2024

maleadt commented Apr 26, 2024

Rework context handling #2346

Rework context handling #2346

Conversation

maleadt commented Apr 25, 2024

Problem

Solution

codecov bot commented Apr 25, 2024 • edited Loading

Codecov Report

maleadt Apr 25, 2024

Choose a reason for hiding this comment

KristofferC Apr 25, 2024

Choose a reason for hiding this comment

maleadt commented Apr 25, 2024

maleadt commented Apr 26, 2024

maleadt commented Apr 26, 2024

codecov bot commented Apr 25, 2024 •

edited

Loading