Releases: JuliaGPU/CUDA.jl
Releases · JuliaGPU/CUDA.jl
v2.2.1
CUDA v2.2.1
v2.2.0
CUDA v2.2.0
Closed issues:
- cudnn missing after downloading artifact (#521)
- Downloading artifact: CUDA110 when using DiffEqFlux (#542)
Merged pull requests:
- Update manifest (#520) (@github-actions[bot])
- Try out Buildkite. (#522) (@maleadt)
- Update manifest (#529) (@github-actions[bot])
- Support for / Upgrade to CUDA 11.1 update 1. (#530) (@maleadt)
- Fix and test svd! (#531) (@maleadt)
- Move more CI to Buildkite. (#532) (@maleadt)
- Use type symbols to generate wrapper methods (#534) (@cqql)
- Fully move to Buildkite. (#537) (@maleadt)
- Add unit_diag option for sv2! functions (#540) (@amontoison)
- Documentation fixes (#543) (@maleadt)
v2.1.0
CUDA v2.1.0
Closed issues:
- CUDNN convolution with Float16 always returns zeros (#92)
- axp(b)y! and mul! (scalar multiplication) with mixed argument types (#144)
- Dispatching to generic matmul instead of CUBLAS (#164)
- Support for Ints and Float16? (#165)
- Subarrays/views support (#172)
- Easy way to pick among multiple GPUs (#174)
- More prominently document JULIA_CUDA_USE_BINARYBUILDER (#204)
- ERROR_COOPERATIVE_LAUNCH_TOO_LARGE during tests (#247)
- Pkg.test error for cutensor test on Windows (#422)
- Runtime build improvements (#456)
- Fusing Wrappers (#467)
- Could not find nvToolsExt (libnvToolsExt.dylib.1.0 or libnvToolsExt.dylib.1) in /Users/imac/.julia/artifacts/b502baf54095dff4a69fd6aba8667124583f6929/lib (#482)
- mapreduce assumes commutative op (#484)
- SubArray Broadcast Bug in 2.0 (#488)
- Nested SubArray Scalar Indexing (#490)
- Sparse matrix * view(vector) regression in 2.0 (#493)
- Error transforming a reshaped 0-dimentional GPU array to a CPU array (#494)
- test cuda FAILURE (#496)
- Reshaped CuArray is not DenseCuArray (#511)
- assignment failure when using array slicing. (#516)
Merged pull requests:
- Use the correct CUDNN scaling parameter type. (#454) (@maleadt)
- Fix versioned dylib discovery. (#486) (@maleadt)
- Move inv from GPUArrays. (#487) (@maleadt)
- Use dense array types in sparse wrappers. (#495) (@maleadt)
- Update manifest (#497) (@github-actions[bot])
- Revert array wrapper union changes (#498) (@maleadt)
- Clean-up pointer field. (#499) (@maleadt)
- mapreduce: change iteration for compatibility with non-commutative operators. (#500) (@maleadt)
- Use versioned libcuda (#502) (@maleadt)
- Dynamically choose versioned libcuda (#503) (@mustafaquraish)
- Update multigpu.md (#504) (@efmanu)
- Upgrade artifacts for CUDA 11 compatibility. (#506) (@maleadt)
- Update dependencies. (#507) (@maleadt)
- Convert unsigned short ints to Cint for printf. (#508) (@maleadt)
- Update manifest (#510) (@github-actions[bot])
- Fix reshape with missing dimensions. (#512) (@maleadt)
- Don't return a pointer from 'alias'. (#513) (@maleadt)
- Add some docs (#514) (@maleadt)
- Fix CUDNN-optimized activation broadcasts (#515) (@maleadt)
- Fix cooperative launch test. (#517) (@maleadt)
- Fixes for Windows (#518) (@maleadt)
- CUTENSOR fixes on Windows (#519) (@maleadt)
v2.0.2
CUDA v2.0.2
Closed issues:
- cu() behavior for complex floating point numbers (#91)
- Error when following example on using multiple GPUs on multiple processes (#468)
- MacOS without nvidia GPU is trying to download CUDA111 on julia nightly (#469)
- Drop BinaryProvider? (#474)
- Latest version of master doesn't work on Windows (#477)
sum(CUDA.rand(3,3))
broken (#480)- copyto!() between cpu and gpu with subarrays (#491)
Merged pull requests:
- Adapt to GPUCompiler changes. (#458) (@maleadt)
- Fix initialization of global state (#471) (@maleadt)
- Remove 'view' implementation. (#472) (@maleadt)
- Workaround new artifact"" eagerness that prevents loading on unsupported platforms (#473) (@ianshmean)
- Remove BinaryProvider dep. (#475) (@maleadt)
- typo: libcuda.dll -> libcuda.so on Linux (#476) (@Alexander-Barth)
- NFC array simplifications. (#481) (@maleadt)
- Update manifest (#485) (@github-actions[bot])
- Convert AbstractArray{ComplexF64} to CuArray{ComplexF32} by default (#489) (@pabloferz)
v2.0.1
v2.0.0
CUDA v2.0.0
Closed issues:
- Test failure during threading tests (#15)
- Bad allocations in memory pool after device_reset! (#16)
- CuArrays can lose Blas on reshaped views (#78)
- allowscalar performance (#87)
- Indexing with a CuArrays causes a 'scalar indexing disallowed' error from checkbounds (#90)
- 5-arg mul! for CUSPARSE (#98)
- copyto!(Device, Host) uses scalar iteration in case of type mismatch (#105)
- Array primitives broken for CUSPARSE arrays (#113)
- SplittingPool: CPU allocations (#117)
- error while concatenating to an empty CuArray (#139)
- Showing sparse arrays goes wrong (#146)
- Improve test coverage (#147)
- CuArrays allocates a lot of memory on the default GPU (#153)
- [Feature Request] Indexing CuArray with CuArray (#155)
- Reshaping CuArray throws error during backpropagation (#162)
- Match syntax and APIs against Julia 1.0 standard libraries (#163)
- CURAND_STATUS_PREEXISTING_FAILURE when setting seed multiple times. (#212)
- RFC: converts
SparseMatrixCSC
toCuSparseMatrixCSR
viacu
by default (#216) - Add a CuSparseMatrixCOO type (#220)
- Test runner stumbles over path separators (#236)
- Error: Invalid bitcode signature when loading CUDA.jl after precompilation (#293)
- Atomic operations only work on global memory (#311)
- Performance: cudnn algorithm selection (#318)
- CUSPARSE is broken in CUDA.jl 1.2 (#322)
- Device-side broadcast regression on 1.5 (#350)
- API for fast math-like mode (#354)
- CUDA 11.0 Update 1: cublasSetWorkspace (#365)
- Can't precompile CUDA.jl on Kubuntu 20.04 (#396)
- CuPtr should be Ptr in cudnnGetDropoutDescriptor (#397)
- CUDA throws OOM error when initializing API on multiple devices (#398)
- Cannot launch kernel with > 5 args using Dynamic Parallelism (#401)
- Reverse performance regression (#410)
- Tag for LLVM 3? (#412)
- CUDA not working (#415)
StatsBase.transform
fails onCuArray
(#426)- Further unification of
CUBLAS.axpy!
andLinearAlgebra.BLAS.axpy!
(#432) - size(range), length(range) and range[end] fail inside CUDA kernels (#434)
- InitError: Cannot use memory pool 'binned' when CUDA.jl was precompiled for memory pool 'split'. (#446)
- Missing dispatch for matrix multiplication with views? (#448)
- New version not available yet? (#452)
- using CUDA or CUArray, output: UndefVarError: AddrSpacePtr not defined (#457)
- Unable to upgrade to the latest version (#459)
Merged pull requests:
- Performance improvements by calling cuDNN API (#321) (@gartangh)
- Use ccall wrapper for correct pointer type conversions (#392) (@maleadt)
- Simplify Statistics.var and fix dims=tuple. (#393) (@maleadt)
- Adapt to GPUArrays test change. (#394) (@maleadt)
- Default to per-thread stream semantics (#395) (@maleadt)
- Add a missing context argument for stateless codegen. (#399) (@maleadt)
- Keep track of package latency timings. (#400) (@maleadt)
- Update manifest (#402) (@github-actions[bot])
- Latency improvements (#403) (@maleadt)
- Fix bounds checking with GPU views. (#404) (@maleadt)
- Force specialization for dynamic_cudacall to support more arguments. (#407) (@maleadt)
- Fix some wrong pointer types in the CUDNN headers. (#408) (@maleadt)
- Refactor CUSPARSE (#409) (@maleadt)
- Fix typo (#411) (@yixingfu)
- Update manifest (#413) (@github-actions[bot])
- Simplify library wrappers by introducing a CUDA Ref (#414) (@maleadt)
- Simplify and update wrappers (#416) (@maleadt)
- GEMM improvements (#417) (@maleadt)
- CompatHelper: add new compat entry for "BFloat16s" at version "0.1" (#418) (@github-actions[bot])
- add CuSparseMatrixCOO (#421) (@marius311)
- Update manifest (#423) (@github-actions[bot])
- Global math mode for easy use of lower-precision functionality (#424) (@maleadt)
- Improve init error message (#425) (@maleadt)
- CUBLAS: wrap rot! to implement rotate! and reflect! (#427) (@maleadt)
- CUFFT-related optimizations (#428) (@maleadt)
- Fix reverse/view regression (#429) (@maleadt)
- Update packages (#433) (@maleadt)
- Introduce StridedCuArray (#435) (@maleadt)
- Retry curandGenerateSeeds when OOM. (#436) (@maleadt)
- Introduce DenseCuArray union (#437) (@maleadt)
- Array simplifications (#438) (@maleadt)
- Fix and test reverse on wrapped array. (#439) (@maleadt)
- Fixes after recent array wrapper changes (#441) (@maleadt)
- Adapt to GPUArrays changes. (#442) (@maleadt)
- Provide CUBLAS with a pool-backed workspace. (#443) (@maleadt)
- Fix finalization of copied arrays. (#444) (@maleadt)
- Support for/Add CUDA 11.1 (#445) (@maleadt)
- Update manifest (#449) (@github-actions[bot])
- Allow use of strided vectors with mul! (gemv! and gemm!) (#450) (@maleadt)
- Have convert call CuSparseArray's constructors. (#451) (@maleadt)
v1.3.3
v1.3.2
v1.3.1
CUDA v1.3.1
Closed issues:
- Element-wise conversion fails (#378)
- atomic_min fails for Int32 in global CuDeviceArrays (#379)
- Segmentation fault from @cuprint on char (#381)
- error in versioninfo(), name not defined (#385)
Merged pull requests:
- Fix docs (#330) (@maleadt)
- Wrap cusparseSpMV (#351) (@marius311)
- specify Cchar rather than char in the doc for @cuprint (#382) (@MasonProtter)
- Adapt to LLVM.jl changes for stateless codegen. (#383) (@maleadt)
v1.3.0
CUDA v1.3.0
Closed issues:
- Trouble with the @. macro (#346)
- NVMLError: Not Supported (code 3) (#348)
- Nvidia Xavier devices: exception thrown during kernel execution on device Xavier (#349)
- Could not load CUTENSOR artifact dll on Windows 10 (#355)
- CuTextureArray for 3D array (#357)
- Bug in julia 1.5.0 I have CUDA 11.0 installed in Ubuntu 18.04 (#360)
- Callback-based logging (#366)
- Artifact download timeout (#369)
sum!
accumulates when called multiple times (#370)- nvprof does not detect kernel launches (#371)
- KernelError: passing and using non-bitstype argument (#372)
- CUDA.jl fails to find libcudadevrt.a due on a cluster install with multi-arch target (#376)
Merged pull requests:
- Make the memory allocator context-aware (#253) (@maleadt)
- Update manifest (#347) (@github-actions[bot])
- Guard against unsupported NVML usage in the test runner. (#352) (@maleadt)
- Bump CUDNN to v8.0.2 (#353) (@maleadt)
- Rework thread state management (#356) (@maleadt)
- Update manifest (#358) (@github-actions[bot])
- Memory allocator simplifications (#361) (@maleadt)
- Deduplicate code from memory pools (#362) (@maleadt)
- Fix show of ArrayBuffer. (#363) (@maleadt)
- Clean-up the Buffer interface. (#364) (@maleadt)
- Use callback APIs to get library debug logs. (#367) (@maleadt)
- Allow selecting the memcheck tool. (#368) (@maleadt)
- Update GPUArrays. (#373) (@maleadt)
- Update to CUDA 11.0 update 1 (#374) (@maleadt)
- Number and iterate devices in versioninfo() following CUDA. (#375) (@maleadt)
- Reinstate support for Julia 1.3 (#377) (@maleadt)