Skip to content

v5.3.4

Compare
Choose a tag to compare
@github-actions github-actions released this 15 May 19:28
c373258

CUDA v5.3.4

Diff since v5.3.3

Merged pull requests:

Closed issues:

  • Native Softmax (#175)
  • CUSOLVER: support eigendecomposition (#173)
  • backslash with gpu matrices crashes julia (#161)
  • at-benchmark captures GPU arrays (#156)
  • Support kernels returning Union{} (#62)
  • mul! falls back to generic implementation (#148)
  • \ on qr factorization objects gives a method error (#138)
  • Compiler failure if dependent module only contains a japi1 function (#49)
  • copy!(dst, src) and copyto!(dst, src) are significantly slower and allocate more memory than copyto!(dest, do, src, so[, N]) (#126)
  • Calling Flux.gpu on a view dumps core (#125)
  • Creating CuArray{Tracker.TrackedReal{Float64},1} a few times causes segfaults (#121)
  • Guard against exceeding maximum kernel parameter size (#32)
  • Detect common API misuse in error handlers (#31)
  • rand and friends default to Float64 (#108)
  • \ does not work for least squares (#104)
  • ERROR_ILLEGAL_ADDRESS when broadcasting modular arithmetic (#94)
  • CuIterator assumes batches to consist of multiple arrays (#86)
  • Algebra with UniformScaling Uses Generic Fallback Scalar Indexing (#85)
  • Document (un)supported language features for kernel programming (#13)
  • Missing dispatch for indexing of reshaped arrays (#556)
  • Track array ownership to avoid illegal memory accesses (#763)
  • NVPTX i128 support broken on LLVM 11 / Julia 1.6 (#793)
  • Support for sm_80 cp.async: asynchronous on-device copies (#850)
  • Profiling Julia with Nsight Systems on Windows results in blank window (#862)
  • sort! and partialsort! are considerably slower than CPU versions (#937)
  • mul! does not dispatch on Adjoint (#1363)
  • Cross-device copy of wrapped arrays fails (#1377)
  • Memory allocation becomes very slow when reserved bytes is large (#1540)
  • Cannot reclaim GPU Memory; CUDA.reclaim() (#1562)
  • Add eigen for general purpose computation of eigenvectors/eigenvalues (#1572)
  • device_reset! does not seem to work anymore (#1579)
  • device-side rand() are not random between successive kernel launches (#1633)
  • Add EnzymeRules support for CUDA.jl (for forward mode here) (#1811)
  • cusparseSetStream_v2 not defined (#1820)
  • Feature request: Integrating the latest CUDA library "cuLitho" into CUDA.jl (#1821)
  • KernelAbstractions.jl-related issues (#1838)
  • lock failing in multithreaded plan_fft() (#1921)
  • CUSolver finalizer tries to take ReentrantLock (#1923)
  • Testsuite could be more careful about parallel testing (#2192)
  • Opportunistic GC collection (#2303)
  • Unable to use local CUDA runtime toolkit (#2367)
  • Enzyme prevents testing on 1.11 (#2376)