Skip to content

Releases: JuliaGPU/CUDA.jl

v4.4.2

04 Apr 09:27
Compare
Choose a tag to compare

CUDA v4.4.2

Diff since v4.4.1

Merged pull requests:

Closed issues:

  • Element-wise conversion to Duals (#127)
  • IDEA: CuHostArray (#28)
  • Make Ref pass by-reference (#267)
  • Failed to compile PTX code when using NSight on Win11 (#1601)
  • view(data, idx) boundschecking is disproportionately expensive (#1678)
  • [CUSOLVER] Add a with_workspaces function to allocate two buffers (Device / Host) (#1767)
  • Trouble using nsight systems for profiling CUDA in Julia (#1779)
  • dlopen("libcudart") results in duplicate libraries (#1814)
  • Support for JLD2 (#1833)
  • Windows Defender mis-labels artifacts as threat (#1836)
  • Support Cholesky factorization of CuSparseMatrixCSR (#1855)
  • Runtime not re-selected after driver upgrade (#1877)
  • Failure to initialize with CUDA_VISIBLE_DEVICES='' (#1945)
  • Cannot precompile GPU code with PrecompileTools (#2006)
  • Evaluating sparse matrices in the REPL has a huge memory footprint (#2016)
  • CUDA_SDK_jll: cuda.h in different locations depending on the platform (#2066)
  • StaticArrays.SHermitianCompact not working in kernels in Julia 1.10.0-beta2 (#2069)
  • Support for LinearAlgebra.pinv (#2070)
  • PTX ISA 8.1 support (#2080)
  • Segmentation fault when importing CUDA (#2083)
  • "No system CUDA driver found" on NixOS (#2089)
  • CUDA.rand(Int64, m, n) can not be used when m or n is zero (#2093)
  • Miss...
Read more

v5.2.0

18 Jan 10:44
5876e9d
Compare
Choose a tag to compare

CUDA v5.2.0

Diff since v5.1.2

Merged pull requests:

Closed issues:

  • Trouble using nsight systems for profiling CUDA in Julia (#1779)
  • Evaluating sparse matrices in the REPL has a huge memory footprint (#2016)
  • Intermittent CI failure: Segfault during nonblocking synchronization (#2141)
  • First test for Julia/CUDA with 15 failures (#2158)
  • Update to CUTENSOR 2.0 (#2174)
  • Tests fail for CUDA#master (#2223)
  • Test failures on Nvidia GH200 (#2227)
  • mul! should support strided outputs (#2230)
  • Please add support for older cuda versions (cuda 8 and older) (#2231)
  • NSight Compute: prevent API calls during precompilation (#2233)
  • Integrated profiler: detect lack of permissions (#2237)

v5.1.2

07 Jan 10:34
fc99b1d
Compare
Choose a tag to compare

CUDA v5.1.2

Diff since v5.1.1

Merged pull requests:

Closed issues:

  • More informative errors when parameter size is too big (#2119)
  • Modifying struct containing CuArray fails in threads in 5.0.0 and 5.1.0 (#2171)
  • Matmul of CuArray{ComplexF32} and CuArray{Float32} is slow (#2175)
  • Support for combining duplicate elements in sparse matrices (#2185)
  • Interactive sessions: periodically trim the memory pool (#2190)
  • Broadcast does not preserve buffer type (#2191)
  • CUDA doesn't precompile on Julia nightly/1.11 (#2195)
  • Latest julia: UndefVarError: make_seed not defined in Random (#2198)
  • CUDA installation fails on Apple Silicon/Julia 1.10 (#2211)
  • Most recent package versions not supported on CUDA.jl (#2212)
  • Testing of CUDA fails (#2222)
  • --debug-info=2 makes NNlibCUDACUDNNExt precompilation run forever (#2225)

v5.1.1

20 Nov 11:38
ffcd7e3
Compare
Choose a tag to compare

CUDA v5.1.1

Diff since v5.1.0

Merged pull requests:

Closed issues:

  • High CPU load during GPU syncronization (#2161)

v5.1.0

07 Nov 15:10
Compare
Choose a tag to compare

CUDA v5.1.0

CUDA.jl 5.1 greatly improves the support of two important parts of the CUDA toolkit: unified memory, for accessing GPU memory on the CPU and vice-versa, and cooperative groups which offer a more modular approach to kernel programming. For more details, see the blog post.

Diff since v5.0.0

Merged pull requests:

Closed issues:

  • Element-wise conversion to Duals (#127)
  • IDEA: CuHostArray (#28)
  • Make Ref pass by-reference (#267)
  • view(data, idx) boundschecking is disproportionately expensive (#1678)
  • [CUSOLVER] Add a with_workspaces function to allocate two buffers (Device / Host) (#1767)
  • dlopen("libcudart") results in duplicate libraries (#1814)
  • Support for JLD2 (#1833)
  • Windows Defender mis-labels artifacts as threat (#1836)
  • Support Cholesky factorization of CuSparseMatrixCSR (#1855)
  • Runtime not re-selected after driver upgrade (#1877)
  • Failure to initialize with CUDA_VISIBLE_DEVICES='' (#1945)
  • Cannot precompile GPU code with PrecompileTools (#2006)
  • CUDA_SDK_jll: cuda.h in different locations depending on the platform (#2066)
  • PTX ISA 8.1 support (#2080)
  • Segmentation fault when importing CUDA (#2083)
  • "No system CUDA driver found" on NixOS (#2089)
  • CUDA.rand(Int64, m, n) can not be used when m or n is zero (#2093)
  • Missing CUDA_Runtime_Discovery as a dependency in cuDNN (#2094)
  • Binaries for Jetson (#2105)
  • Minimum/maximum of array of NaNs is infinity (#2111)
  • Performance regression for multiple @sync copyto! on CUDA v5 (#2112)
  • [CUBLAS] Regenerate the wrappers with updated argument types (#2115)
  • Unable to allocate unified memory buffers (#2120)
  • CUDA 12.3 has been released (#2122)
  • atomic min, max for Float32 and Float64 (#2129)
  • Native profiler output is limited to around 100 columns when printing to a file (#2130)
  • LLVM generates max.NaN which only works on sm_80 (#2148)
  • Unified memory-related error on Tegra T194 (#2149)
  • Errors on sm_61 (#2150)

v5.0.0

19 Sep 08:39
2fa6572
Compare
Choose a tag to compare

CUDA v5.0.0

Blog post: https://info.juliahub.com/cuda-jl-5-0-changes

This is a breaking release, but the breaking changes are minimal (see the blog post for details):

  • Julia 1.8 is now required, and only CUDA 11.4+ is supported
  • selection of local toolkits has changed slightly

Diff since v4.4.1

Merged pull requests:

Closed issues:

  • StaticArrays.SHermitianCompact not working in kernels in Julia 1.10.0-beta2 (#2069)
  • Support for LinearAlgebra.pinv (#2070)

v4.4.1

25 Aug 20:24
Compare
Choose a tag to compare

CUDA v4.4.1

Diff since v4.4.0

Closed issues:

  • CUDA driver device support does not match toolkit (#70)
  • Launching kernels should not allocate (#66)
  • sync_threads() appears to not be sync'ing threads (#61)
  • Exception when using CuArrays with Flux (#129)
  • Kernel using MVector fails to compile or crashes at runtime due to heap allocation (#45)
  • Performance regression on matrix multiplication between CUDA.jl 1.3.3 and 2.1.0/master (#538)
  • Improve 'VS C++ redistributable' error message (#764)
  • CUSPARSE does not support reductions (#1406)
  • CUDA test failed (#1690)
  • Type constructor in broadcast doesn't compile (#1761)
  • accumulate(+) gives different results for CuArray compared to Array. (#1810)
  • Compat driver: preload all libraries (#1859)
  • Stream synchronization is slow when waiting on the event from CUDA (#1910)
  • cuDNN: Store convolution algorithm choice to disk. (#1947)
  • Disable 'No CUDA-capable device found' error log (#1955)
  • CUDNN_STATUS_NOT_SUPPORTED using 1D CNN model (#1977)
  • Memory allocations during in-place sparse matrix-vector multiplication (#1982)
  • CUSPARSE.sum_dim1 sums the absolute values of elements (#1983)
  • Update to CUDA 12.2 (#1984)
  • unsafe_wrap fails on zero element CuArrays (#1985)
  • rand in kernel works in a deterministic way (#2008)
  • Scalar indexing with CuArray * ReshapedArray{SubArray{CuArray}}} (#2009)
  • volumerhs performance regression (#2010)
  • CuSparseMatrix constructors allocate too much memory? (#2015)
  • Native profiler using CUPTI (#2017)
  • libLLVM-15jl.so (#2018)
  • "symbol multiply defined" error (#2021)
  • Confusion on row major vs column major (#2023)
  • Printing of CuArrays gives zeros or random numbers (#2033)
  • sortperm! fails when output is UInt vector (#2046)
  • Re-introduce spinning loop before nonblocking synchronization (#2057)

Merged pull requests:

v4.4.0

26 Jun 20:29
315c80e
Compare
Choose a tag to compare

CUDA v4.4.0

Diff since v4.3.2

Closed issues:

  • Unreachable control flow leads to illegal divergent barriers (#1746)
  • CUBLAS fails on new CUDA.jl v4 (#1852)
  • Sort fails on Lovelace (sm8.9) GPUs (#1874)
  • gesvd! crashes on Pascal and v12.0 (#1932)
  • No effect for calling "nsys launch" (#1938)
  • Basic math operations with nested adjoint and transpose (#1940)
  • CPU and GPU implementations return results at dissimilar scales, even in double precision arithmetics (#1950)
  • Failed CUDA.jl initialization breaks Flux? (#1952)
  • Recent mul! changes break multiplication with matrices that have StaticArray elements (#1953)
  • Test infrastructure: define test groups (#1961)
  • Strange rand errors when sampling large matrices (#1963)
  • Add aqua tests (#1964)
  • Support of Orin GPU from Nvidia ? (#1966)
  • Crash in LLVM (#1971)
  • Warning cuDNN Convolution (#1972)
  • Strange behaviour when installed at system level (#1973)

Merged pull requests:

v4.3.2

02 Jun 05:55
acd245e
Compare
Choose a tag to compare

CUDA v4.3.2

Diff since v4.3.1

Merged pull requests:

v4.3.1

31 May 19:40
b7420f8
Compare
Choose a tag to compare

CUDA v4.3.1

Diff since v4.3.0

Closed issues:

  • Array testsuite compiles kernel with large types (#1902)
  • CUDA.jl v4 installs CUDA runtime despite version=local (#1922)
  • Occaisonal "CUSOLVERError: an internal operation failed (code 7, CUSOLVER_STATUS_INTERNAL_ERROR)" (#1924)
  • Does [email protected] need [email protected]? (#1929)

Merged pull requests: