Skip to content

v4.4.2

Compare
Choose a tag to compare
@github-actions github-actions released this 04 Apr 09:27

CUDA v4.4.2

Diff since v4.4.1

Merged pull requests:

Closed issues:

  • Element-wise conversion to Duals (#127)
  • IDEA: CuHostArray (#28)
  • Make Ref pass by-reference (#267)
  • Failed to compile PTX code when using NSight on Win11 (#1601)
  • view(data, idx) boundschecking is disproportionately expensive (#1678)
  • [CUSOLVER] Add a with_workspaces function to allocate two buffers (Device / Host) (#1767)
  • Trouble using nsight systems for profiling CUDA in Julia (#1779)
  • dlopen("libcudart") results in duplicate libraries (#1814)
  • Support for JLD2 (#1833)
  • Windows Defender mis-labels artifacts as threat (#1836)
  • Support Cholesky factorization of CuSparseMatrixCSR (#1855)
  • Runtime not re-selected after driver upgrade (#1877)
  • Failure to initialize with CUDA_VISIBLE_DEVICES='' (#1945)
  • Cannot precompile GPU code with PrecompileTools (#2006)
  • Evaluating sparse matrices in the REPL has a huge memory footprint (#2016)
  • CUDA_SDK_jll: cuda.h in different locations depending on the platform (#2066)
  • StaticArrays.SHermitianCompact not working in kernels in Julia 1.10.0-beta2 (#2069)
  • Support for LinearAlgebra.pinv (#2070)
  • PTX ISA 8.1 support (#2080)
  • Segmentation fault when importing CUDA (#2083)
  • "No system CUDA driver found" on NixOS (#2089)
  • CUDA.rand(Int64, m, n) can not be used when m or n is zero (#2093)
  • Missing CUDA_Runtime_Discovery as a dependency in cuDNN (#2094)
  • Binaries for Jetson (#2105)
  • Minimum/maximum of array of NaNs is infinity (#2111)
  • Performance regression for multiple @sync copyto! on CUDA v5 (#2112)
  • [CUBLAS] Regenerate the wrappers with updated argument types (#2115)
  • More informative errors when parameter size is too big (#2119)
  • Unable to allocate unified memory buffers (#2120)
  • CUDA 12.3 has been released (#2122)
  • atomic min, max for Float32 and Float64 (#2129)
  • Native profiler output is limited to around 100 columns when printing to a file (#2130)
  • Intermittent CI failure: Segfault during nonblocking synchronization (#2141)
  • LLVM generates max.NaN which only works on sm_80 (#2148)
  • Unified memory-related error on Tegra T194 (#2149)
  • Errors on sm_61 (#2150)
  • First test for Julia/CUDA with 15 failures (#2158)
  • High CPU load during GPU syncronization (#2161)
  • Modifying struct containing CuArray fails in threads in 5.0.0 and 5.1.0 (#2171)
  • Update to CUTENSOR 2.0 (#2174)
  • Matmul of CuArray{ComplexF32} and CuArray{Float32} is slow (#2175)
  • Support for combining duplicate elements in sparse matrices (#2185)
  • Interactive sessions: periodically trim the memory pool (#2190)
  • Broadcast does not preserve buffer type (#2191)
  • CUDA doesn't precompile on Julia nightly/1.11 (#2195)
  • Latest julia: UndefVarError: make_seed not defined in Random (#2198)
  • NVTX-related segfault on Windows under compute-sanitizer (#2204)
  • CUDA installation fails on Apple Silicon/Julia 1.10 (#2211)
  • Most recent package versions not supported on CUDA.jl (#2212)
  • Testing of CUDA fails (#2222)
  • Tests fail for CUDA#master (#2223)
  • --debug-info=2 makes NNlibCUDACUDNNExt precompilation run forever (#2225)
  • Test failures on Nvidia GH200 (#2227)
  • mul! should support strided outputs (#2230)
  • Please add support for older cuda versions (cuda 8 and older) (#2231)
  • NSight Compute: prevent API calls during precompilation (#2233)
  • Integrated profiler: detect lack of permissions (#2237)
  • Inverse Complex-to-Real FFT allocates GPU memory (#2249)
  • cuDNN not available for your platform (#2252)
  • Cannot reset CuArray to zero (#2257)
  • Cannot take gradient of sort on 2D CuArray (#2259)
  • Multi-threaded code hanging forever with Julia 1.10 (#2261)
  • CUBLAS: nrm2 support for StridedCuArray with length requiring Int64 (#2268)
  • Adjoint not supported on Diagonal arrays (#2275)
  • Regression in broadcast: getting Array (Julia 1.10) instead of CuArray (Julia 1.9) (#2276)
  • Release v5.3? (#2283)
  • Wrap CUDSS? (#2287)
  • Bug concerning broadcast between device array and unified array (#2289)
  • StackOverflowError trying to throw OutOfGPUMemoryError, subsequent errors (#2292)
  • BUG: sortperm! seems to perform much slower than it should (#2293)
  • Multiplying CuSparseMatrixCSC by CuMatrix results in Out of GPU memory (#2296)
  • BFloat16 support broken on Julia 1.11 (#2306)