Releases · JuliaGPU/CUDA.jl

CUDNN convolution with Float16 always returns zeros (#92)
axp(b)y! and mul! (scalar multiplication) with mixed argument types (#144)
Dispatching to generic matmul instead of CUBLAS (#164)
Support for Ints and Float16? (#165)
Subarrays/views support (#172)
Easy way to pick among multiple GPUs (#174)
More prominently document JULIA_CUDA_USE_BINARYBUILDER (#204)
ERROR_COOPERATIVE_LAUNCH_TOO_LARGE during tests (#247)
Pkg.test error for cutensor test on Windows (#422)
Runtime build improvements (#456)
Fusing Wrappers (#467)
Could not find nvToolsExt (libnvToolsExt.dylib.1.0 or libnvToolsExt.dylib.1) in /Users/imac/.julia/artifacts/b502baf54095dff4a69fd6aba8667124583f6929/lib (#482)
mapreduce assumes commutative op (#484)
SubArray Broadcast Bug in 2.0 (#488)
Nested SubArray Scalar Indexing (#490)
Sparse matrix * view(vector) regression in 2.0 (#493)
Error transforming a reshaped 0-dimentional GPU array to a CPU array (#494)
test cuda FAILURE (#496)
Reshaped CuArray is not DenseCuArray (#511)
assignment failure when using array slicing. (#516)

Merged pull requests:

Use the correct CUDNN scaling parameter type. (#454) (@maleadt)
Fix versioned dylib discovery. (#486) (@maleadt)
Move inv from GPUArrays. (#487) (@maleadt)
Use dense array types in sparse wrappers. (#495) (@maleadt)
Update manifest (#497) (@github-actions[bot])
Revert array wrapper union changes (#498) (@maleadt)
Clean-up pointer field. (#499) (@maleadt)
mapreduce: change iteration for compatibility with non-commutative operators. (#500) (@maleadt)
Use versioned libcuda (#502) (@maleadt)
Dynamically choose versioned libcuda (#503) (@mustafaquraish)
Update multigpu.md (#504) (@efmanu)
Upgrade artifacts for CUDA 11 compatibility. (#506) (@maleadt)
Update dependencies. (#507) (@maleadt)
Convert unsigned short ints to Cint for printf. (#508) (@maleadt)
Update manifest (#510) (@github-actions[bot])
Fix reshape with missing dimensions. (#512) (@maleadt)
Don't return a pointer from 'alias'. (#513) (@maleadt)
Add some docs (#514) (@maleadt)
Fix CUDNN-optimized activation broadcasts (#515) (@maleadt)
Fix cooperative launch test. (#517) (@maleadt)
Fixes for Windows (#518) (@maleadt)
CUTENSOR fixes on Windows (#519) (@maleadt)

Assets 2

15 Oct 14:14

github-actions

v2.0.2

a8ac15c

v2.0.2

CUDA v2.0.2

Diff since v2.0.1

Closed issues:

cu() behavior for complex floating point numbers (#91)
Error when following example on using multiple GPUs on multiple processes (#468)
MacOS without nvidia GPU is trying to download CUDA111 on julia nightly (#469)
Drop BinaryProvider? (#474)
Latest version of master doesn't work on Windows (#477)
sum(CUDA.rand(3,3)) broken (#480)
copyto!() between cpu and gpu with subarrays (#491)

Merged pull requests:

Adapt to GPUCompiler changes. (#458) (@maleadt)
Fix initialization of global state (#471) (@maleadt)
Remove 'view' implementation. (#472) (@maleadt)
Workaround new artifact"" eagerness that prevents loading on unsupported platforms (#473) (@ianshmean)
Remove BinaryProvider dep. (#475) (@maleadt)
typo: libcuda.dll -> libcuda.so on Linux (#476) (@Alexander-Barth)
NFC array simplifications. (#481) (@maleadt)
Update manifest (#485) (@github-actions[bot])
Convert AbstractArray{ComplexF64} to CuArray{ComplexF32} by default (#489) (@pabloferz)

Assets 2

05 Oct 08:12

github-actions

v2.0.1

785c3b3

v2.0.1

CUDA v2.0.1

Diff since v2.0.0

Closed issues:

Can't update (#462)

Merged pull requests:

Remove duplicate comment (#464) (@blegat)
Add functionality to precompile the runtime library. (#465) (@maleadt)
Update manifest (#470) (@github-actions[bot])

Assets 2

02 Oct 07:12

github-actions

v2.0.0

70d93cc

v2.0.0

CUDA v2.0.0

Diff since v1.3.3

Closed issues:

Test failure during threading tests (#15)
Bad allocations in memory pool after device_reset! (#16)
CuArrays can lose Blas on reshaped views (#78)
allowscalar performance (#87)
Indexing with a CuArrays causes a 'scalar indexing disallowed' error from checkbounds (#90)
5-arg mul! for CUSPARSE (#98)
copyto!(Device, Host) uses scalar iteration in case of type mismatch (#105)
Array primitives broken for CUSPARSE arrays (#113)
SplittingPool: CPU allocations (#117)
error while concatenating to an empty CuArray (#139)
Showing sparse arrays goes wrong (#146)
Improve test coverage (#147)
CuArrays allocates a lot of memory on the default GPU (#153)
[Feature Request] Indexing CuArray with CuArray (#155)
Reshaping CuArray throws error during backpropagation (#162)
Match syntax and APIs against Julia 1.0 standard libraries (#163)
CURAND_STATUS_PREEXISTING_FAILURE when setting seed multiple times. (#212)
RFC: converts SparseMatrixCSC to CuSparseMatrixCSR via cu by default (#216)
Add a CuSparseMatrixCOO type (#220)
Test runner stumbles over path separators (#236)
Error: Invalid bitcode signature when loading CUDA.jl after precompilation (#293)
Atomic operations only work on global memory (#311)
Performance: cudnn algorithm selection (#318)
CUSPARSE is broken in CUDA.jl 1.2 (#322)
Device-side broadcast regression on 1.5 (#350)
API for fast math-like mode (#354)
CUDA 11.0 Update 1: cublasSetWorkspace (#365)
Can't precompile CUDA.jl on Kubuntu 20.04 (#396)
CuPtr should be Ptr in cudnnGetDropoutDescriptor (#397)
CUDA throws OOM error when initializing API on multiple devices (#398)
Cannot launch kernel with > 5 args using Dynamic Parallelism (#401)
Reverse performance regression (#410)
Tag for LLVM 3? (#412)
CUDA not working (#415)
StatsBase.transform fails on CuArray (#426)
Further unification of CUBLAS.axpy! and LinearAlgebra.BLAS.axpy! (#432)
size(range), length(range) and range[end] fail inside CUDA kernels (#434)
InitError: Cannot use memory pool 'binned' when CUDA.jl was precompiled for memory pool 'split'. (#446)
Missing dispatch for matrix multiplication with views? (#448)
New version not available yet? (#452)
using CUDA or CUArray, output: UndefVarError: AddrSpacePtr not defined (#457)
Unable to upgrade to the latest version (#459)

Merged pull requests:

Performance improvements by calling cuDNN API (#321) (@gartangh)
Use ccall wrapper for correct pointer type conversions (#392) (@maleadt)
Simplify Statistics.var and fix dims=tuple. (#393) (@maleadt)
Adapt to GPUArrays test change. (#394) (@maleadt)
Default to per-thread stream semantics (#395) (@maleadt)
Add a missing context argument for stateless codegen. (#399) (@maleadt)
Keep track of package latency timings. (#400) (@maleadt)
Update manifest (#402) (@github-actions[bot])
Latency improvements (#403) (@maleadt)
Fix bounds checking with GPU views. (#404) (@maleadt)
Force specialization for dynamic_cudacall to support more arguments. (#407) (@maleadt)
Fix some wrong pointer types in the CUDNN headers. (#408) (@maleadt)
Refactor CUSPARSE (#409) (@maleadt)
Fix typo (#411) (@yixingfu)
Update manifest (#413) (@github-actions[bot])
Simplify library wrappers by introducing a CUDA Ref (#414) (@maleadt)
Simplify and update wrappers (#416) (@maleadt)
GEMM improvements (#417) (@maleadt)
CompatHelper: add new compat entry for "BFloat16s" at version "0.1" (#418) (@github-actions[bot])
add CuSparseMatrixCOO (#421) (@marius311)
Update manifest (#423) (@github-actions[bot])
Global math mode for easy use of lower-precision functionality (#424) (@maleadt)
Improve init error message (#425) (@maleadt)
CUBLAS: wrap rot! to implement rotate! and reflect! (#427) (@maleadt)
CUFFT-related optimizations (#428) (@maleadt)
Fix reverse/view regression (#429) (@maleadt)
Update packages (#433) (@maleadt)
Introduce StridedCuArray (#435) (@maleadt)
Retry curandGenerateSeeds when OOM. (#436) (@maleadt)
Introduce DenseCuArray union (#437) (@maleadt)
Array simplifications (#438) (@maleadt)
Fix and test reverse on wrapped array. (#439) (@maleadt)
Fixes after recent array wrapper changes (#441) (@maleadt)
Adapt to GPUArrays changes. (#442) (@maleadt)
Provide CUBLAS with a pool-backed workspace. (#443) (@maleadt)
Fix finalization of copied arrays. (#444) (@maleadt)
Support for/Add CUDA 11.1 (#445) (@maleadt)
Update manifest (#449) (@github-actions[bot])
Allow use of strided vectors with mul! (gemv! and gemm!) (#450) (@maleadt)
Have convert call CuSparseArray's constructors. (#451) (@maleadt)

Assets 2

25 Aug 11:08

github-actions

v1.3.3

be21077

v1.3.3

CUDA v1.3.3

Diff since v1.3.2

Closed issues:

Type changing Array conversions give error when allowscalar(false) (#344)
getindex(::CuArray, ::Adjoint, ::Colon) fails (#345)
View with array indices causes memory copy before broadcast (#384)
Regression with Julia 1.5 (#390)

Merged pull requests:

Replace DevicePtr with Core.LLVMPtr. (#199) (@maleadt)
Make sure view indices reside on the GPU too. (#388) (@maleadt)
CompatHelper: Update DataStructures to v0.18 (#389) (@ChrisRackauckas)

Assets 2

24 Aug 07:09

github-actions

v1.3.2

56c5157

v1.3.2

CUDA v1.3.2

Diff since v1.3.1

Closed issues:

LLVM WMMA errors (#380)

Merged pull requests:

Fix handling of tests to skip. (#386) (@maleadt)
Update manifest (#387) (@github-actions[bot])

Assets 2

22 Aug 07:11

github-actions

v1.3.1

6ed96e6

v1.3.1

CUDA v1.3.1

Diff since v1.3.0

Closed issues:

Element-wise conversion fails (#378)
atomic_min fails for Int32 in global CuDeviceArrays (#379)
Segmentation fault from @cuprint on char (#381)
error in versioninfo(), name not defined (#385)

Merged pull requests:

Fix docs (#330) (@maleadt)
Wrap cusparseSpMV (#351) (@marius311)
specify Cchar rather than char in the doc for @cuprint (#382) (@MasonProtter)
Adapt to LLVM.jl changes for stateless codegen. (#383) (@maleadt)

Assets 2

19 Aug 13:09

github-actions

v1.3.0

e48d0dc

v1.3.0

CUDA v1.3.0

Diff since v1.2.1

Closed issues:

Trouble with the @. macro (#346)
NVMLError: Not Supported (code 3) (#348)
Nvidia Xavier devices: exception thrown during kernel execution on device Xavier (#349)
Could not load CUTENSOR artifact dll on Windows 10 (#355)
CuTextureArray for 3D array (#357)
Bug in julia 1.5.0 I have CUDA 11.0 installed in Ubuntu 18.04 (#360)
Callback-based logging (#366)
Artifact download timeout (#369)
sum! accumulates when called multiple times (#370)
nvprof does not detect kernel launches (#371)
KernelError: passing and using non-bitstype argument (#372)
CUDA.jl fails to find libcudadevrt.a due on a cluster install with multi-arch target (#376)

Merged pull requests:

Make the memory allocator context-aware (#253) (@maleadt)
Update manifest (#347) (@github-actions[bot])
Guard against unsupported NVML usage in the test runner. (#352) (@maleadt)
Bump CUDNN to v8.0.2 (#353) (@maleadt)
Rework thread state management (#356) (@maleadt)
Update manifest (#358) (@github-actions[bot])
Memory allocator simplifications (#361) (@maleadt)
Deduplicate code from memory pools (#362) (@maleadt)
Fix show of ArrayBuffer. (#363) (@maleadt)
Clean-up the Buffer interface. (#364) (@maleadt)
Use callback APIs to get library debug logs. (#367) (@maleadt)
Allow selecting the memcheck tool. (#368) (@maleadt)
Update GPUArrays. (#373) (@maleadt)
Update to CUDA 11.0 update 1 (#374) (@maleadt)
Number and iterate devices in versioninfo() following CUDA. (#375) (@maleadt)
Reinstate support for Julia 1.3 (#377) (@maleadt)

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA v2.2.1

CUDA v2.2.0

CUDA v2.1.0

CUDA v2.0.2

CUDA v2.0.1

CUDA v2.0.0

CUDA v1.3.3

CUDA v1.3.2

CUDA v1.3.1

CUDA v1.3.0

Releases: JuliaGPU/CUDA.jl

v2.2.1

CUDA v2.2.1

v2.2.0

CUDA v2.2.0

v2.1.0

CUDA v2.1.0

v2.0.2

CUDA v2.0.2

v2.0.1

CUDA v2.0.1

v2.0.0

CUDA v2.0.0

v1.3.3

CUDA v1.3.3

v1.3.2

CUDA v1.3.2

v1.3.1

CUDA v1.3.1

v1.3.0

CUDA v1.3.0