v5.3.0
CUDA v5.3.0
Merged pull requests:
- CuSparseArrayCSR (fixed cat ambiguitites from #1944) (#2244) (@nikopj)
- Slightly rework error handling (#2245) (@maleadt)
- cuTENSOR improvements (#2246) (@maleadt)
- Make
@device_code_sass
work with non-Julia kernels. (#2247) (@maleadt) - Improve Tegra detection. (#2251) (@maleadt)
- Added few SparseArrays functions (#2254) (@albertomercurio)
- Reduce locking in the handle cache (#2256) (@maleadt)
- Mark all CUDA ccalls as GC safe (#2262) (@vchuravy)
- cuTENSOR: Fix reference to undefined variable (#2263) (@lkdvos)
- cuTENSOR: refactor obtaining compute_type as part of plan (#2264) (@lkdvos)
- Re-generate headers. (#2265) (@maleadt)
- Update to CUDNN 9. (#2267) (@maleadt)
- [CUBLAS] Use the ILP64 API with CUDA 12 (#2270) (@amontoison)
- CompatHelper: bump compat for GPUCompiler to 0.26, (keep existing compat) (#2271) (@github-actions[bot])
- Minor improvements to nonblocking synchronization. (#2272) (@maleadt)
- Add extension package for StaticArrays (#2273) (@trahflow)
- Fix cuTensor, cuTensorNet and cuStateVec when using local Toolkit (#2274) (@bjoe2k4)
- Cached workspace prototype for custatevec (#2279) (@kshyatt)
- Update the Julia wrappers for v12.4 (#2282) (@amontoison)
- Add support for CUDA 12.4. (#2286) (@maleadt)
- Test suite changes (#2288) (@maleadt)
- Fix mixed-buffer/mixed-shape broadcasts. (#2290) (@maleadt)
- Towards supporting Julia 1.11 (#2291) (@maleadt)
- Fix typo in performance tips (#2294) (@Zentrik)
- Make it possible to customize the CuIterator adaptor. (#2297) (@maleadt)
- Set default buffer size in
CUSPARSE
mm!
functions (#2298) (@lpawela) - Avoid OOMs during OOM handling. (#2299) (@maleadt)
- [CUSOLVER] Add tests for geqrf, orgqr and ormqr (#2300) (@amontoison)
- [CUSOLVER] Interface larft! (#2301) (@amontoison)
- Fix RNG determinism when using wrapped arrays. (#2307) (@maleadt)
- sortperm with dims (#2308) (@xaellison)
- [CUBLAS] Interface gemm_grouped_batched (#2310) (@amontoison)
- [CUSPARSE] Add a method convert for the type cusparseSpSMUpdate_t (#2311) (@amontoison)
- Avoid capturing
AbstractArray
s inBoundsError
(#2314) (@lcw) - Clarify debug level hint. (#2316) (@maleadt)
Closed issues:
- Failed to compile PTX code when using NSight on Win11 (#1601)
sortperm
fails withdims
keyword (#2061)- NVTX-related segfault on Windows under compute-sanitizer (#2204)
- Inverse Complex-to-Real FFT allocates GPU memory (#2249)
- cuDNN not available for your platform (#2252)
- Cannot reset CuArray to zero (#2257)
- Cannot take gradient of
sort
on 2D CuArray (#2259) - Multi-threaded code hanging forever with Julia 1.10 (#2261)
- CUBLAS: nrm2 support for StridedCuArray with length requiring Int64 (#2268)
- Adjoint not supported on Diagonal arrays (#2275)
- Regression in broadcast: getting Array (Julia 1.10) instead of CuArray (Julia 1.9) (#2276)
- Release v5.3? (#2283)
- Wrap CUDSS? (#2287)
- Bug concerning broadcast between device array and unified array (#2289)
StackOverflowError
trying to throwOutOfGPUMemoryError
, subsequent errors (#2292)- BUG: sortperm! seems to perform much slower than it should (#2293)
- Multiplying
CuSparseMatrixCSC
byCuMatrix
results inOut of GPU memory
(#2296) - BFloat16 support broken on Julia 1.11 (#2306)
- does not emit line info for debbuging/profiling (#2312)
- Kernel using
StaticArray
compiles in julia v1.9.4 but not in v1.10.2 (#2313) - Using copyto! with SharedArray trigger scalar indexing disallowed error (#2317)