v5.3.2
CUDA v5.3.2
Merged pull requests:
- Add EnzymeCore extension for parent_job (#2281) (@vchuravy)
- Consider running GC when allocating and synchronizing (#2304) (@maleadt)
- Refactor memory wrappers (#2335) (@maleadt)
- Auto-detect external profilers. (#2339) (@maleadt)
- Fix performance of indexing unified memory. (#2340) (@maleadt)
- Improve exception output (#2342) (@maleadt)
- Test multigpu on CI (#2348) (@maleadt)
- cuQuantum 24.3: Bump cuTensorNet. (#2350) (@maleadt)
- cuQuantum 24.3: Bump cuStateVec. (#2351) (@maleadt)
Closed issues:
- CuArrays don't seem to display correctly in VS code (#875)
- Task scheduling can result in delays when synchronizing (#1525)
- Docs: add example on task-based parallelism with explicit synchronization (#1566)
- Exception output from many threads is not helpful (#1780)
- Autodetect external profiler (#2176)
- LazyInitialized is not GC-safe (#2216)
- Track CuArray stream usage (#2236)
- Improve cross-device usage (#2323)
- CUBLASLt wrapper for
cublasLtMatmulDescSetAttribute
can have device buffers as input (#2337) - Improve error message when assigning real valued arrray with complex numbers (#2341)
@device_code_sass
broken (#2343)- Readme says Cuda 11 is supported but also the last version to support it is v4.4 (#2345)
@gcsafe_ccall
breaks inlining of ccall wrappers (#2347)