-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiplying CuSparseMatrixCSC
by CuMatrix
results in Out of GPU memory
#2296
Comments
I also tried reproducing on some different systems, and the same errors occur on: julia> CUDA.versioninfo()
CUDA runtime 12.3, artifact installation
CUDA driver 12.3
NVIDIA driver 545.23.6
CUDA libraries:
- CUBLAS: 12.3.4
- CURAND: 10.3.4
- CUFFT: 11.0.12
- CUSOLVER: 11.5.4
- CUSPARSE: 12.2.0
- CUPTI: 21.0.0
- NVML: 12.0.0+545.23.6
Julia packages:
- CUDA: 5.2.0
- CUDA_Driver_jll: 0.7.0+1
- CUDA_Runtime_jll: 0.11.1+0
Toolchain:
- Julia: 1.10.2
- LLVM: 15.0.7
2 devices:
0: NVIDIA RTX A6000 (sm_86, 44.256 GiB / 44.988 GiB available)
1: NVIDIA RTX A6000 (sm_86, 44.548 GiB / 44.988 GiB available)
julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 32 × 13th Gen Intel(R) Core(TM) i9-13900K
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, goldmont)
Threads: 1 default, 0 interactive, 1 GC (on 32 virtual cores) Strangely, on this system julia> versioninfo()
Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 36 × Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-13.0.1 (ORCJIT, skylake-avx512)
Threads: 1 on 36 virtual cores
julia> CUDA.versioninfo()
CUDA runtime 11.8, artifact installation
CUDA driver 11.4
NVIDIA driver 470.182.3
CUDA libraries:
- CUBLAS: 11.11.3
- CURAND: 10.3.0
- CUFFT: 10.9.0
- CUSOLVER: 11.4.1
- CUSPARSE: 11.7.5
- CUPTI: 18.0.0
- NVML: 11.0.0+470.182.3
Julia packages:
- CUDA.jl: 5.1.2
- CUDA_Driver_jll: 0.7.0+1
- CUDA_Runtime_jll: 0.10.1+0
- CUDA_Runtime_Discovery: 0.2.3
Toolchain:
- Julia: 1.8.5
- LLVM: 13.0.1
2 devices:
0: NVIDIA TITAN RTX (sm_75, 23.124 GiB / 23.653 GiB available)
1: NVIDIA TITAN RTX (sm_75, 23.647 GiB / 23.650 GiB available) the CSC examples pass, the CSR one fails. After upgrade to newer Julia and CUDA.jl julia> CUDA.versioninfo()
CUDA runtime 11.8, artifact installation
CUDA driver 11.4
NVIDIA driver 470.182.3
CUDA libraries:
- CUBLAS: 11.11.3
- CURAND: 10.3.0
- CUFFT: 10.9.0
- CUSOLVER: 11.4.1
- CUSPARSE: 11.7.5
- CUPTI: 18.0.0
- NVML: 11.0.0+470.182.3
Julia packages:
- CUDA: 5.2.0
- CUDA_Driver_jll: 0.7.0+1
- CUDA_Runtime_jll: 0.11.1+0
Toolchain:
- Julia: 1.10.2
- LLVM: 15.0.7
2 devices:
0: NVIDIA TITAN RTX (sm_75, 23.124 GiB / 23.653 GiB available)
1: NVIDIA TITAN RTX (sm_75, 23.647 GiB / 23.650 GiB available)
julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 36 × Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-15.0.7 (ORCJIT, skylake-avx512)
Threads: 1 default, 0 interactive, 1 GC (on 36 virtual cores) the behavior stays the same - CSC works, CSR does not. Seems that with CUDA 11 the CSC examples work and fail with CUDA 12. |
@lpawela An hotfix is to specify an initial value 0 for the size of the buffer here. |
@amontoison setting |
@amontoison I went through the code setting the buffers to zero in some places (master...lpawela:CUDA.jl:lp/sparse-buffer-size). Now I get the following errors julia> sparse32csc * dense32 # ERROR
ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
[1] throw_api_error(res::CUDA.cudaError_enum)
@ CUDA ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:30
[2] nonblocking_synchronize(val::CuContext)
@ CUDA ~/lib/CUDA.jl/lib/cudadrv/synchronization.jl:174
[3] device_synchronize(; blocking::Bool, spin::Bool)
@ CUDA ~/lib/CUDA.jl/lib/cudadrv/synchronization.jl:185
[4] device_synchronize
@ ~/lib/CUDA.jl/lib/cudadrv/synchronization.jl:180 [inlined]
[5] maybe_synchronize_cuda()
@ CUDA ~/lib/CUDA.jl/src/initialization.jl:217
[6] top-level scope
@ ~/lib/CUDA.jl/src/initialization.jl:208
julia> dense32 * sparse32csc # NO ERROR
ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
[1] throw_api_error(res::CUDA.cudaError_enum)
@ CUDA ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:30
[2] isdone
@ ~/lib/CUDA.jl/lib/cudadrv/stream.jl:111 [inlined]
[3] spinning_synchronization(f::typeof(CUDA.isdone), obj::CuStream)
@ CUDA ~/lib/CUDA.jl/lib/cudadrv/synchronization.jl:79
[4] device_synchronize(; blocking::Bool, spin::Bool)
@ CUDA ~/lib/CUDA.jl/lib/cudadrv/synchronization.jl:182
[5] device_synchronize
@ ~/lib/CUDA.jl/lib/cudadrv/synchronization.jl:180 [inlined]
[6] maybe_synchronize_cuda()
@ CUDA ~/lib/CUDA.jl/src/initialization.jl:217
[7] top-level scope
@ ~/lib/CUDA.jl/src/initialization.jl:208
caused by: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
[1] throw_api_error(res::CUDA.cudaError_enum)
@ CUDA ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:30
[2] check
@ ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:37 [inlined]
[3] cuMemAllocFromPoolAsync
@ ~/lib/CUDA.jl/lib/utils/call.jl:30 [inlined]
[4] #alloc#1
@ ~/lib/CUDA.jl/lib/cudadrv/memory.jl:81 [inlined]
[5] alloc
@ ~/lib/CUDA.jl/lib/cudadrv/memory.jl:71 [inlined]
[6] actual_alloc(bytes::Int64; async::Bool, stream::CuStream, pool::CuMemoryPool)
@ CUDA ~/lib/CUDA.jl/src/pool.jl:66
[7] actual_alloc
@ ~/lib/CUDA.jl/src/pool.jl:59 [inlined]
[8] #1060
@ ~/lib/CUDA.jl/src/pool.jl:453 [inlined]
[9] retry_reclaim
@ ~/lib/CUDA.jl/src/pool.jl:370 [inlined]
[10] macro expansion
@ ~/lib/CUDA.jl/src/pool.jl:452 [inlined]
[11] macro expansion
@ ./timing.jl:395 [inlined]
[12] #_alloc#1059
@ ~/lib/CUDA.jl/src/pool.jl:448 [inlined]
[13] _alloc
@ ~/lib/CUDA.jl/src/pool.jl:444 [inlined]
[14] #alloc#1058
@ ~/lib/CUDA.jl/src/pool.jl:434 [inlined]
[15] alloc
@ ~/lib/CUDA.jl/src/pool.jl:428 [inlined]
[16] CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}(::UndefInitializer, dims::Tuple{Int64, Int64})
@ CUDA ~/lib/CUDA.jl/src/array.jl:74
[17] CuArray
@ ~/lib/CUDA.jl/src/array.jl:147 [inlined]
[18] CuArray
@ ~/lib/CUDA.jl/src/array.jl:162 [inlined]
[19] *(A::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, B::CUDA.CUSPARSE.CuSparseMatrixCSC{Float32, Int32})
@ CUDA.CUSPARSE ~/lib/CUDA.jl/lib/cusparse/interfaces.jl:129
[20] top-level scope
@ REPL[9]:1
[21] top-level scope
@ ~/lib/CUDA.jl/src/initialization.jl:206
julia> (sparse32csc' * dense32')' # ERROR
WARNING: Error while freeing DeviceBuffer(4 bytes at 0x0000000302001800):
CUDA.CuError(code=CUDA.cudaError_enum(0x000002bc), details=CUDA.Optional{String}(data=nothing))
Stacktrace:
[1] throw_api_error(res::CUDA.cudaError_enum)
@ CUDA ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:30
[2] check
@ ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:37 [inlined]
[3] cuMemFreeAsync
@ ~/lib/CUDA.jl/lib/utils/call.jl:30 [inlined]
[4] free(buf::CUDA.Mem.DeviceBuffer; stream::CuStream)
@ CUDA.Mem ~/lib/CUDA.jl/lib/cudadrv/memory.jl:97
[5] free
@ ~/lib/CUDA.jl/lib/cudadrv/memory.jl:92 [inlined]
[6] #actual_free#1042
@ ~/lib/CUDA.jl/src/pool.jl:78 [inlined]
[7] actual_free
@ ~/lib/CUDA.jl/src/pool.jl:75 [inlined]
[8] #_free#1067
@ ~/lib/CUDA.jl/src/pool.jl:523 [inlined]
[9] _free
@ ~/lib/CUDA.jl/src/pool.jl:510 [inlined]
[10] macro expansion
@ ~/lib/CUDA.jl/src/pool.jl:495 [inlined]
[11] macro expansion
@ ./timing.jl:395 [inlined]
[12] #free#1066
@ ~/lib/CUDA.jl/src/pool.jl:494 [inlined]
[13] free
@ ~/lib/CUDA.jl/src/pool.jl:483 [inlined]
[14] (::CUDA.var"#1073#1074"{CUDA.Mem.DeviceBuffer, Bool})()
@ CUDA ~/lib/CUDA.jl/src/array.jl:101
[15] #context!#954
@ ~/lib/CUDA.jl/lib/cudadrv/state.jl:170 [inlined]
[16] context!
@ ~/lib/CUDA.jl/lib/cudadrv/state.jl:165 [inlined]
[17] _free_buffer(buf::CUDA.Mem.DeviceBuffer, early::Bool)
@ CUDA ~/lib/CUDA.jl/src/array.jl:89
[18] release(rc::GPUArrays.RefCounted{CUDA.Mem.DeviceBuffer}, args::Bool)
@ GPUArrays ~/.julia/packages/GPUArrays/Hd5Sk/src/host/abstractarray.jl:42
[19] unsafe_free!
@ ~/.julia/packages/GPUArrays/Hd5Sk/src/host/abstractarray.jl:91 [inlined]
[20] unsafe_finalize!(xs::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})
@ CUDA ~/lib/CUDA.jl/src/array.jl:113
[21] top-level scope
@ REPL[10]:1
[22] top-level scope
@ ~/lib/CUDA.jl/src/initialization.jl:206
[23] eval
@ ./boot.jl:385 [inlined]
[24] eval_user_input(ast::Any, backend::REPL.REPLBackend, mod::Module)
@ REPL ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:150
[25] repl_backend_loop(backend::REPL.REPLBackend, get_module::Function)
@ REPL ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:246
[26] start_repl_backend(backend::REPL.REPLBackend, consumer::Any; get_module::Function)
@ REPL ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:231
[27] run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool, backend::Any)
@ REPL ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:389
[28] run_repl(repl::REPL.AbstractREPL, consumer::Any)
@ REPL ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:375
[29] (::Base.var"#1013#1015"{Bool, Bool, Bool})(REPL::Module)
@ Base ./client.jl:432
[30] #invokelatest#2
@ ./essentials.jl:892 [inlined]
[31] invokelatest
@ ./essentials.jl:889 [inlined]
[32] run_main_repl(interactive::Bool, quiet::Bool, banner::Bool, history_file::Bool, color_set::Bool)
@ Base ./client.jl:416
[33] exec_options(opts::Base.JLOptions)
@ Base ./client.jl:333
[34] _start()
@ Base ./client.jl:552
ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
[1] throw_api_error(res::CUDA.cudaError_enum)
@ CUDA ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:30
[2] isdone
@ ~/lib/CUDA.jl/lib/cudadrv/stream.jl:111 [inlined]
[3] spinning_synchronization(f::typeof(CUDA.isdone), obj::CuStream)
@ CUDA ~/lib/CUDA.jl/lib/cudadrv/synchronization.jl:79
[4] device_synchronize(; blocking::Bool, spin::Bool)
@ CUDA ~/lib/CUDA.jl/lib/cudadrv/synchronization.jl:182
[5] device_synchronize
@ ~/lib/CUDA.jl/lib/cudadrv/synchronization.jl:180 [inlined]
[6] maybe_synchronize_cuda()
@ CUDA ~/lib/CUDA.jl/src/initialization.jl:217
[7] top-level scope
@ ~/lib/CUDA.jl/src/initialization.jl:208
caused by: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
[1] throw_api_error(res::CUDA.cudaError_enum)
@ CUDA ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:30
[2] check
@ ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:37 [inlined]
[3] cuMemAllocFromPoolAsync
@ ~/lib/CUDA.jl/lib/utils/call.jl:30 [inlined]
[4] #alloc#1
@ ~/lib/CUDA.jl/lib/cudadrv/memory.jl:81 [inlined]
[5] alloc
@ ~/lib/CUDA.jl/lib/cudadrv/memory.jl:71 [inlined]
[6] actual_alloc(bytes::Int64; async::Bool, stream::CuStream, pool::CuMemoryPool)
@ CUDA ~/lib/CUDA.jl/src/pool.jl:66
[7] actual_alloc
@ ~/lib/CUDA.jl/src/pool.jl:59 [inlined]
[8] #1060
@ ~/lib/CUDA.jl/src/pool.jl:453 [inlined]
[9] retry_reclaim
@ ~/lib/CUDA.jl/src/pool.jl:370 [inlined]
[10] macro expansion
@ ~/lib/CUDA.jl/src/pool.jl:452 [inlined]
[11] macro expansion
@ ./timing.jl:395 [inlined]
[12] #_alloc#1059
@ ~/lib/CUDA.jl/src/pool.jl:448 [inlined]
[13] _alloc
@ ~/lib/CUDA.jl/src/pool.jl:444 [inlined]
[14] #alloc#1058
@ ~/lib/CUDA.jl/src/pool.jl:434 [inlined]
[15] alloc
@ ~/lib/CUDA.jl/src/pool.jl:428 [inlined]
[16] CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}(::UndefInitializer, dims::Tuple{Int64, Int64})
@ CUDA ~/lib/CUDA.jl/src/array.jl:74
[17] similar
@ ~/lib/CUDA.jl/src/array.jl:196 [inlined]
[18] similar
@ ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/adjtrans.jl:361 [inlined]
[19] *(A::LinearAlgebra.Adjoint{Float32, CUDA.CUSPARSE.CuSparseMatrixCSC{Float32, Int32}}, B::LinearAlgebra.Adjoint{Float32, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}})
@ LinearAlgebra ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:106
[20] top-level scope
@ REPL[10]:1
[21] top-level scope
@ ~/lib/CUDA.jl/src/initialization.jl:206
|
Can you try with a default buffer size 10000 instead of 0? |
@amontoison Yes, this works, thank you. I started a PR with these changes (#2298). Maybe some other places also need updating? |
It's a bug in the NVIDIA routine so we should use it only as a workaround for now. |
Sanity checks (read this first, then remove this section)
Make sure you're reporting a bug; for general questions, please use Discourse or
Slack.
If you're dealing with a performance issue, make sure you disable scalar iteration
(
CUDA.allowscalar(false)
). Only file an issue if that shows scalar iteration happeningin CUDA.jl or Base Julia, as opposed to your own code.
If you're seeing an error message, follow the error message instructions, if any
(e.g.
inspect code with @device_code_warntype
). If you can't solve the problem usingthat information, make sure to post it as part of the issue.
Always ensure you're using the latest version of CUDA.jl, and if possible, please
check the master branch to see if your issue hasn't been resolved yet.
If your bug is still valid, please go ahead and fill out the template below.
Describe the bug
Some multiplications with dense and sparse matrices result in
To reproduce
The Minimal Working Example (MWE) for this bug:
Manifest.toml
Expected behavior
Correct 1x1 matrix multiplication.
Version info
Details on Julia:
Details on CUDA:
The text was updated successfully, but these errors were encountered: