Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiplying CuSparseMatrixCSC by CuMatrix results in Out of GPU memory #2296

Closed
4 tasks done
lpawela opened this issue Mar 20, 2024 · 7 comments · Fixed by #2298
Closed
4 tasks done

Multiplying CuSparseMatrixCSC by CuMatrix results in Out of GPU memory #2296

lpawela opened this issue Mar 20, 2024 · 7 comments · Fixed by #2298
Labels
bug Something isn't working

Comments

@lpawela
Copy link
Contributor

lpawela commented Mar 20, 2024

Sanity checks (read this first, then remove this section)

  • Make sure you're reporting a bug; for general questions, please use Discourse or
    Slack.

  • If you're dealing with a performance issue, make sure you disable scalar iteration
    (CUDA.allowscalar(false)). Only file an issue if that shows scalar iteration happening
    in CUDA.jl or Base Julia, as opposed to your own code.

  • If you're seeing an error message, follow the error message instructions, if any
    (e.g. inspect code with @device_code_warntype). If you can't solve the problem using
    that information, make sure to post it as part of the issue.

  • Always ensure you're using the latest version of CUDA.jl, and if possible, please
    check the master branch to see if your issue hasn't been resolved yet.

If your bug is still valid, please go ahead and fill out the template below.

Describe the bug

Some multiplications with dense and sparse matrices result in

ERROR: Out of GPU memory trying to allocate 113.295 TiB
Effective GPU memory usage: 19.63% (1.916 GiB/9.759 GiB)
Memory pool usage: 72 bytes (32.000 MiB reserved)

Stacktrace:
  [1] macro expansion
    @ ~/.julia/packages/CUDA/35NC6/src/pool.jl:435 [inlined]
  [2] macro expansion
    @ ./timing.jl:395 [inlined]
  [3] #_alloc#991
    @ ~/.julia/packages/CUDA/35NC6/src/pool.jl:424 [inlined]
  [4] _alloc
    @ ~/.julia/packages/CUDA/35NC6/src/pool.jl:419 [inlined]
  [5] #alloc#990
    @ ~/.julia/packages/CUDA/35NC6/src/pool.jl:409 [inlined]
  [6] alloc
    @ ~/.julia/packages/CUDA/35NC6/src/pool.jl:403 [inlined]
  [7] CuArray{UInt8, 1, CUDA.Mem.DeviceBuffer}(::UndefInitializer, dims::Tuple{Int64})
    @ CUDA ~/.julia/packages/CUDA/35NC6/src/array.jl:93
  [8] CuArray
    @ ~/.julia/packages/CUDA/35NC6/src/array.jl:176 [inlined]
  [9] CuArray
    @ ~/.julia/packages/CUDA/35NC6/src/array.jl:183 [inlined]
 [10] with_workspace(f::CUDA.CUSPARSE.var"#1330#1332"{}, eltyp::Type{…}, size::CUDA.CUSPARSE.var"#bufferSize#1331"{}, fallback::Nothing; keep::Bool)
    @ CUDA.APIUtils ~/.julia/packages/CUDA/35NC6/lib/utils/call.jl:67
 [11] with_workspace
    @ ~/.julia/packages/CUDA/35NC6/lib/utils/call.jl:58 [inlined]
 [12] with_workspace (repeats 2 times)
    @ ~/.julia/packages/CUDA/35NC6/lib/utils/call.jl:55 [inlined]
 [13] mm!(transa::Char, transb::Char, alpha::Bool, A::CUDA.CUSPARSE.CuSparseMatrixCSC{…}, B::CuArray{…}, beta::Bool, C::CuArray{…}, index::Char, algo::CUDA.CUSPARSE.cusparseSpMMAlg_t)
    @ CUDA.CUSPARSE ~/.julia/packages/CUDA/35NC6/lib/cusparse/generic.jl:237
 [14] mm!
    @ ~/.julia/packages/CUDA/35NC6/lib/cusparse/generic.jl:197 [inlined]
 [15] mm_wrapper(transa::Char, transb::Char, alpha::Bool, A::CUDA.CUSPARSE.CuSparseMatrixCSC{Float64, Int32}, B::CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}, beta::Bool, C::CuArray{Float64, 2, CUDA.Mem.DeviceBuffer})
    @ CUDA.CUSPARSE ~/.julia/packages/CUDA/35NC6/lib/cusparse/interfaces.jl:46
 [16] generic_matmatmul!(C::CuArray{…}, tA::Char, tB::Char, A::CUDA.CUSPARSE.CuSparseMatrixCSC{…}, B::CuArray{…}, _add::LinearAlgebra.MulAddMul{…})
    @ CUDA.CUSPARSE ~/.julia/packages/CUDA/35NC6/lib/cusparse/interfaces.jl:76
 [17] mul!
    @ ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:263 [inlined]
 [18] mul!
    @ ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:237 [inlined]
 [19] *(A::CUDA.CUSPARSE.CuSparseMatrixCSC{Float64, Int32}, B::CuArray{Float64, 2, CUDA.Mem.DeviceBuffer})
    @ LinearAlgebra ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:106
 [20] top-level scope
    @ REPL[17]:1
 [21] top-level scope
    @ ~/.julia/packages/CUDA/35NC6/src/initialization.jl:190

To reproduce

The Minimal Working Example (MWE) for this bug:

using CUDA
using SparseArrays

dense32 = CUDA.rand(1, 1)
sparse32csc = cu(sprand(Float32, 1, 1, 1.))

dense64 = CUDA.rand(Float64, 1, 1)
sparse64csc = CUSPARSE.CuSparseMatrixCSC{Float64, Int32}(sparse32csc.colPtr, sparse32csc.rowVal, sparse32csc.nzVal, (1, 1))

sparse64csr = CUSPARSE.CuSparseMatrixCSR(dense64)

sparse32csc * dense32 # ERROR
dense32 * sparse32csc # NO ERROR
(sparse32csc' * dense32')' # ERROR

sparse64csc * dense64 # ERROR
dense64 * sparse64csc # NO ERROR
(dense64' * sparse64csc')' # NO ERROR

sparse64csr * dense64 # NO ERROR
dense64 * sparse64csr # ERROR
(sparse64csr' * dense64')' # NO ERROR
Manifest.toml

# This file is machine-generated - editing it directly is not advised

julia_version = "1.10.1"
manifest_format = "2.0"
project_hash = "aa98b1f95df17ebf5569f91383899271dde8d36d"

[[deps.AbstractFFTs]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "d92ad398961a3ed262d8bf04a1a2b8340f915fef"
uuid = "621f4979-c628-5d54-868e-fcf4e3e8185c"
version = "1.5.0"
weakdeps = ["ChainRulesCore", "Test"]

    [deps.AbstractFFTs.extensions]
    AbstractFFTsChainRulesCoreExt = "ChainRulesCore"
    AbstractFFTsTestExt = "Test"

[[deps.Adapt]]
deps = ["LinearAlgebra", "Requires"]
git-tree-sha1 = "cde29ddf7e5726c9fb511f340244ea3481267608"
uuid = "79e6a3ab-5dfb-504d-930d-738a2a938a0e"
version = "3.7.2"
weakdeps = ["StaticArrays"]

    [deps.Adapt.extensions]
    AdaptStaticArraysExt = "StaticArrays"

[[deps.ArgTools]]
uuid = "0dad84c5-d112-42e6-8d28-ef12dabb789f"
version = "1.1.1"

[[deps.ArnoldiMethod]]
deps = ["LinearAlgebra", "Random", "StaticArrays"]
git-tree-sha1 = "62e51b39331de8911e4a7ff6f5aaf38a5f4cc0ae"
uuid = "ec485272-7323-5ecc-a04f-4719b315124d"
version = "0.2.0"

[[deps.Artifacts]]
uuid = "56f22d72-fd6d-98f1-02f0-08ddc0907c33"

[[deps.Atomix]]
deps = ["UnsafeAtomics"]
git-tree-sha1 = "c06a868224ecba914baa6942988e2f2aade419be"
uuid = "a9b6321e-bd34-4604-b9c9-b65b8de01458"
version = "0.1.0"

[[deps.AutoHashEquals]]
git-tree-sha1 = "45bb6705d93be619b81451bb2006b7ee5d4e4453"
uuid = "15f4f7f2-30c1-5605-9d31-71845cf9641f"
version = "0.2.0"

[[deps.BFloat16s]]
deps = ["LinearAlgebra", "Printf", "Random", "Test"]
git-tree-sha1 = "dbf84058d0a8cbbadee18d25cf606934b22d7c66"
uuid = "ab4f0b2a-ad5b-11e8-123f-65d77653426b"
version = "0.4.2"

[[deps.Base64]]
uuid = "2a0f44e3-6c83-55bd-87e4-b1978d98bd5f"

[[deps.BitFlags]]
git-tree-sha1 = "2dc09997850d68179b69dafb58ae806167a32b1b"
uuid = "d1d4a3ce-64b1-5f1a-9ba4-7e7e69966f35"
version = "0.1.8"

[[deps.CEnum]]
git-tree-sha1 = "eb4cb44a499229b3b8426dcfb5dd85333951ff90"
uuid = "fa961155-64e5-5f13-b03f-caf6b980ea82"
version = "0.4.2"

[[deps.CSTParser]]
deps = ["Tokenize"]
git-tree-sha1 = "b544d62417a99d091c569b95109bc9d8c223e9e3"
uuid = "00ebfdb7-1f24-5e51-bd34-a7502290713f"
version = "3.4.2"

[[deps.CSV]]
deps = ["Dates", "Mmap", "Parsers", "PooledArrays", "SentinelArrays", "Tables", "Unicode"]
git-tree-sha1 = "b83aa3f513be680454437a0eee21001607e5d983"
uuid = "336ed68f-0bac-5ca0-87d4-7b16caf5d00b"
version = "0.8.5"

[[deps.CUDA]]
deps = ["AbstractFFTs", "Adapt", "BFloat16s", "CEnum", "CUDA_Driver_jll", "CUDA_Runtime_Discovery", "CUDA_Runtime_jll", "ExprTools", "GPUArrays", "GPUCompiler", "KernelAbstractions", "LLVM", "LazyArtifacts", "Libdl", "LinearAlgebra", "Logging", "Preferences", "Printf", "Random", "Random123", "RandomNumbers", "Reexport", "Requires", "SparseArrays", "SpecialFunctions", "UnsafeAtomicsLLVM"]
git-tree-sha1 = "968c1365e2992824c3e7a794e30907483f8469a9"
uuid = "052768ef-5323-5732-b1bb-66c8b64840ba"
version = "4.4.1"

[[deps.CUDA_Driver_jll]]
deps = ["Artifacts", "JLLWrappers", "LazyArtifacts", "Libdl", "Pkg"]
git-tree-sha1 = "498f45593f6ddc0adff64a9310bb6710e851781b"
uuid = "4ee394cb-3365-5eb0-8335-949819d2adfc"
version = "0.5.0+1"

[[deps.CUDA_Runtime_Discovery]]
deps = ["Libdl"]
git-tree-sha1 = "2cb12f6b2209f40a4b8967697689a47c50485490"
uuid = "1af6417a-86b4-443c-805f-a4643ffb695f"
version = "0.2.3"

[[deps.CUDA_Runtime_jll]]
deps = ["Artifacts", "CUDA_Driver_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "TOML"]
git-tree-sha1 = "5248d9c45712e51e27ba9b30eebec65658c6ce29"
uuid = "76a88914-d11a-5bdc-97e0-2f5a05c973a2"
version = "0.6.0+0"

[[deps.CUTENSOR_jll]]
deps = ["Artifacts", "CUDA_Runtime_jll", "CompilerSupportLibraries_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "TOML"]
git-tree-sha1 = "e231d9b8894558e22bb35910a2c5e7458655744f"
uuid = "35b6c64b-1ee1-5834-92a3-3f624899209a"
version = "1.7.0+1"

[[deps.ChainRulesCore]]
deps = ["Compat", "LinearAlgebra"]
git-tree-sha1 = "575cd02e080939a33b6df6c5853d14924c08e35b"
uuid = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
version = "1.23.0"
weakdeps = ["SparseArrays"]

    [deps.ChainRulesCore.extensions]
    ChainRulesCoreSparseArraysExt = "SparseArrays"

[[deps.CodecZlib]]
deps = ["TranscodingStreams", "Zlib_jll"]
git-tree-sha1 = "59939d8a997469ee05c4b4944560a820f9ba0d73"
uuid = "944b1d66-785c-5afd-91f1-9de20f533193"
version = "0.7.4"

[[deps.CommonMark]]
deps = ["Crayons", "JSON", "PrecompileTools", "URIs"]
git-tree-sha1 = "532c4185d3c9037c0237546d817858b23cf9e071"
uuid = "a80b9123-70ca-4bc0-993e-6e3bcb318db6"
version = "0.8.12"

[[deps.Compat]]
deps = ["TOML", "UUIDs"]
git-tree-sha1 = "c955881e3c981181362ae4088b35995446298b80"
uuid = "34da2185-b29b-5c13-b0c7-acf172513d20"
version = "4.14.0"
weakdeps = ["Dates", "LinearAlgebra"]

    [deps.Compat.extensions]
    CompatLinearAlgebraExt = "LinearAlgebra"

[[deps.CompilerSupportLibraries_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "e66e0078-7015-5450-92f7-15fbd957f2ae"
version = "1.1.0+0"

[[deps.ConcurrentUtilities]]
deps = ["Serialization", "Sockets"]
git-tree-sha1 = "9c4708e3ed2b799e6124b5673a712dda0b596a9b"
uuid = "f0e56b4a-5159-44fe-b623-3e5288b988bb"
version = "2.3.1"

[[deps.Crayons]]
git-tree-sha1 = "249fe38abf76d48563e2f4556bebd215aa317e15"
uuid = "a8cc5b0e-0ffa-5ad4-8c14-923d3ee1735f"
version = "4.1.1"

[[deps.DataAPI]]
git-tree-sha1 = "abe83f3a2f1b857aac70ef8b269080af17764bbe"
uuid = "9a962f9c-6df0-11e9-0e5d-c546b8b5ee8a"
version = "1.16.0"

[[deps.DataStructures]]
deps = ["Compat", "InteractiveUtils", "OrderedCollections"]
git-tree-sha1 = "0f4b5d62a88d8f59003e43c25a8a90de9eb76317"
uuid = "864edb3b-99cc-5e75-8d2d-829cb0a9cfe8"
version = "0.18.18"

[[deps.DataValueInterfaces]]
git-tree-sha1 = "bfc1187b79289637fa0ef6d4436ebdfe6905cbd6"
uuid = "e2d170a0-9d28-54be-80f0-106bbe20a464"
version = "1.0.0"

[[deps.Dates]]
deps = ["Printf"]
uuid = "ade2ca70-3891-5945-98fb-dc099432e06a"

[[deps.DelimitedFiles]]
deps = ["Mmap"]
git-tree-sha1 = "9e2f36d3c96a820c678f2f1f1782582fcf685bae"
uuid = "8bb1440f-4735-579b-a4ab-409b98df4dab"
version = "1.9.1"

[[deps.Distances]]
deps = ["LinearAlgebra", "Statistics", "StatsAPI"]
git-tree-sha1 = "66c4c81f259586e8f002eacebc177e1fb06363b0"
uuid = "b4f34e82-e78d-54a5-968a-f98e89d6e8f7"
version = "0.10.11"
weakdeps = ["ChainRulesCore", "SparseArrays"]

    [deps.Distances.extensions]
    DistancesChainRulesCoreExt = "ChainRulesCore"
    DistancesSparseArraysExt = "SparseArrays"

[[deps.Distributed]]
deps = ["Random", "Serialization", "Sockets"]
uuid = "8ba89e20-285c-5b6f-9357-94700520ee1b"

[[deps.DocStringExtensions]]
deps = ["LibGit2"]
git-tree-sha1 = "2fb1e02f2b635d0845df5d7c167fec4dd739b00d"
uuid = "ffbed154-4ef7-542d-bbb7-c09d3a79fcae"
version = "0.9.3"

[[deps.Downloads]]
deps = ["ArgTools", "FileWatching", "LibCURL", "NetworkOptions"]
uuid = "f43a241f-c20a-4ad4-852c-f6b1247861c6"
version = "1.6.0"

[[deps.ExceptionUnwrapping]]
deps = ["Test"]
git-tree-sha1 = "dcb08a0d93ec0b1cdc4af184b26b591e9695423a"
uuid = "460bff9d-24e4-43bc-9d9f-a8973cb893f4"
version = "0.1.10"

[[deps.ExprTools]]
git-tree-sha1 = "27415f162e6028e81c72b82ef756bf321213b6ec"
uuid = "e2ba6199-217a-4e67-a87a-7c52f15ade04"
version = "0.1.10"

[[deps.FFTW]]
deps = ["AbstractFFTs", "FFTW_jll", "LinearAlgebra", "MKL_jll", "Preferences", "Reexport"]
git-tree-sha1 = "4820348781ae578893311153d69049a93d05f39d"
uuid = "7a1cc6ca-52ef-59f5-83cd-3a7055c09341"
version = "1.8.0"

[[deps.FFTW_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
git-tree-sha1 = "c6033cc3892d0ef5bb9cd29b7f2f0331ea5184ea"
uuid = "f5851436-0d7a-5f13-b9de-f02708fd171a"
version = "3.3.10+0"

[[deps.FileIO]]
deps = ["Pkg", "Requires", "UUIDs"]
git-tree-sha1 = "c5c28c245101bd59154f649e19b038d15901b5dc"
uuid = "5789e2e9-d7fb-5bc7-8068-2c6fae9b9549"
version = "1.16.2"

[[deps.FileWatching]]
uuid = "7b1f6079-737a-58dc-b8bc-7a2ca5c1b5ee"

[[deps.Future]]
deps = ["Random"]
uuid = "9fa8497b-333b-5362-9e8d-4d0656e87820"

[[deps.GPUArrays]]
deps = ["Adapt", "GPUArraysCore", "LLVM", "LinearAlgebra", "Printf", "Random", "Reexport", "Serialization", "Statistics"]
git-tree-sha1 = "2e57b4a4f9cc15e85a24d603256fe08e527f48d1"
uuid = "0c68f7d7-f131-5f86-a1c3-88cf8149b2d7"
version = "8.8.1"

[[deps.GPUArraysCore]]
deps = ["Adapt"]
git-tree-sha1 = "2d6ca471a6c7b536127afccfa7564b5b39227fe0"
uuid = "46192b85-c4d5-4398-a991-12ede77f4527"
version = "0.1.5"

[[deps.GPUCompiler]]
deps = ["ExprTools", "InteractiveUtils", "LLVM", "Libdl", "Logging", "Scratch", "TimerOutputs", "UUIDs"]
git-tree-sha1 = "72b2e3c2ba583d1a7aa35129e56cf92e07c083e3"
uuid = "61eb1bfa-7361-4325-ad38-22787b887f55"
version = "0.21.4"

[[deps.GitHub]]
deps = ["Base64", "Dates", "HTTP", "JSON", "MbedTLS", "Sockets", "SodiumSeal", "URIs"]
git-tree-sha1 = "7ee730a8484d673a8ce21d8536acfe6494475994"
uuid = "bc5e4493-9b4d-5f90-b8aa-2b2bcaad7a26"
version = "5.9.0"

[[deps.Glob]]
git-tree-sha1 = "97285bbd5230dd766e9ef6749b80fc617126d496"
uuid = "c27321d9-0574-5035-807b-f59d2c89b15c"
version = "1.3.1"

[[deps.Graphs]]
deps = ["ArnoldiMethod", "Compat", "DataStructures", "Distributed", "Inflate", "LinearAlgebra", "Random", "SharedArrays", "SimpleTraits", "SparseArrays", "Statistics"]
git-tree-sha1 = "899050ace26649433ef1af25bc17a815b3db52b7"
uuid = "86223c79-3864-5bf0-83f7-82e725a168b6"
version = "1.9.0"

[[deps.HDF5]]
deps = ["Compat", "HDF5_jll", "Libdl", "MPIPreferences", "Mmap", "Preferences", "Printf", "Random", "Requires", "UUIDs"]
git-tree-sha1 = "26407bd1c60129062cec9da63dc7d08251544d53"
uuid = "f67ccb44-e63f-5c2f-98bd-6dc0ccc4ba2f"
version = "0.17.1"

    [deps.HDF5.extensions]
    MPIExt = "MPI"

    [deps.HDF5.weakdeps]
    MPI = "da04e1cc-30fd-572f-bb4f-1f8673147195"

[[deps.HDF5_jll]]
deps = ["Artifacts", "CompilerSupportLibraries_jll", "JLLWrappers", "LazyArtifacts", "LibCURL_jll", "Libdl", "MPICH_jll", "MPIPreferences", "MPItrampoline_jll", "MicrosoftMPI_jll", "OpenMPI_jll", "OpenSSL_jll", "TOML", "Zlib_jll", "libaec_jll"]
git-tree-sha1 = "e4591176488495bf44d7456bd73179d87d5e6eab"
uuid = "0234f1f7-429e-5d53-9886-15a909be8d59"
version = "1.14.3+1"

[[deps.HTTP]]
deps = ["Base64", "CodecZlib", "ConcurrentUtilities", "Dates", "ExceptionUnwrapping", "Logging", "LoggingExtras", "MbedTLS", "NetworkOptions", "OpenSSL", "Random", "SimpleBufferStream", "Sockets", "URIs", "UUIDs"]
git-tree-sha1 = "db864f2d91f68a5912937af80327d288ea1f3aee"
uuid = "cd3eb016-35fb-5094-929b-558a96fad6f3"
version = "1.10.3"

[[deps.Hwloc_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl"]
git-tree-sha1 = "ca0f6bf568b4bfc807e7537f081c81e35ceca114"
uuid = "e33a78d0-f292-5ffc-b300-72abe9b543c8"
version = "2.10.0+0"

[[deps.Inflate]]
git-tree-sha1 = "ea8031dea4aff6bd41f1df8f2fdfb25b33626381"
uuid = "d25df0c9-e2be-5dd7-82c8-3ad0b3e990b9"
version = "0.1.4"

[[deps.IntelOpenMP_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
git-tree-sha1 = "ad37c091f7d7daf900963171600d7c1c5c3ede32"
uuid = "1d5cc7b8-4909-519e-a0f8-d0f5ad9712d0"
version = "2023.2.0+0"

[[deps.InteractiveUtils]]
deps = ["Markdown"]
uuid = "b77e0a4c-d291-57a0-90e8-8db25a27a240"

[[deps.IrrationalConstants]]
git-tree-sha1 = "630b497eafcc20001bba38a4651b327dcfc491d2"
uuid = "92d709cd-6900-40b7-9082-c6be49f344b6"
version = "0.2.2"

[[deps.IteratorInterfaceExtensions]]
git-tree-sha1 = "a3f24677c21f5bbe9d2a714f95dcd58337fb2856"
uuid = "82899510-4779-5014-852e-03e436cf321d"
version = "1.0.0"

[[deps.JLD2]]
deps = ["FileIO", "MacroTools", "Mmap", "OrderedCollections", "Pkg", "PrecompileTools", "Printf", "Reexport", "Requires", "TranscodingStreams", "UUIDs"]
git-tree-sha1 = "5ea6acdd53a51d897672edb694e3cc2912f3f8a7"
uuid = "033835bb-8acc-5ee8-8aae-3f567f8a3819"
version = "0.4.46"

[[deps.JLLWrappers]]
deps = ["Artifacts", "Preferences"]
git-tree-sha1 = "7e5d6779a1e09a36db2a7b6cff50942a0a7d0fca"
uuid = "692b3bcd-3c85-4b1f-b108-f13ce0eb3210"
version = "1.5.0"

[[deps.JSON]]
deps = ["Dates", "Mmap", "Parsers", "Unicode"]
git-tree-sha1 = "31e996f0a15c7b280ba9f76636b3ff9e2ae58c9a"
uuid = "682c06a0-de6a-54ab-a142-c8b1cf79cde6"
version = "0.21.4"

[[deps.JuliaFormatter]]
deps = ["CSTParser", "CommonMark", "DataStructures", "Glob", "Pkg", "PrecompileTools", "Tokenize"]
git-tree-sha1 = "1954b04bf7ce17ed708ce9059d05881f58f07845"
uuid = "98e50ef6-434e-11e9-1051-2b60c6c9e899"
version = "1.0.52"

[[deps.KernelAbstractions]]
deps = ["Adapt", "Atomix", "InteractiveUtils", "LinearAlgebra", "MacroTools", "PrecompileTools", "Requires", "SparseArrays", "StaticArrays", "UUIDs", "UnsafeAtomics", "UnsafeAtomicsLLVM"]
git-tree-sha1 = "ed7167240f40e62d97c1f5f7735dea6de3cc5c49"
uuid = "63c18a36-062a-441e-b654-da1e3ab1ce7c"
version = "0.9.18"

    [deps.KernelAbstractions.extensions]
    EnzymeExt = "EnzymeCore"

    [deps.KernelAbstractions.weakdeps]
    EnzymeCore = "f151be2c-9106-41f4-ab19-57ee4f262869"

[[deps.LLVM]]
deps = ["CEnum", "LLVMExtra_jll", "Libdl", "Preferences", "Printf", "Requires", "Unicode"]
git-tree-sha1 = "ddab4d40513bce53c8e3157825e245224f74fae7"
uuid = "929cbde3-209d-540e-8aea-75f648917ca0"
version = "6.6.0"
weakdeps = ["BFloat16s"]

    [deps.LLVM.extensions]
    BFloat16sExt = "BFloat16s"

[[deps.LLVMExtra_jll]]
deps = ["Artifacts", "JLLWrappers", "LazyArtifacts", "Libdl", "TOML"]
git-tree-sha1 = "88b916503aac4fb7f701bb625cd84ca5dd1677bc"
uuid = "dad2f222-ce93-54a1-a47d-0025e8a3acab"
version = "0.0.29+0"

[[deps.LRUCache]]
git-tree-sha1 = "b3cc6698599b10e652832c2f23db3cab99d51b59"
uuid = "8ac3fa9e-de4c-5943-b1dc-09c6b5f20637"
version = "1.6.1"
weakdeps = ["Serialization"]

    [deps.LRUCache.extensions]
    SerializationExt = ["Serialization"]

[[deps.LabelledGraphs]]
deps = ["Graphs", "MetaGraphs"]
git-tree-sha1 = "436f40ecb7360aed88ed69893c659b35d738919b"
uuid = "605abd48-4d17-4660-b914-d4df33194460"
version = "0.4.4"

[[deps.LazilyInitializedFields]]
git-tree-sha1 = "8f7f3cabab0fd1800699663533b6d5cb3fc0e612"
uuid = "0e77f7df-68c5-4e49-93ce-4cd80f5598bf"
version = "1.2.2"

[[deps.LazyArtifacts]]
deps = ["Artifacts", "Pkg"]
uuid = "4af54fe1-eca0-43a8-85a7-787d91b784e3"

[[deps.LazyStack]]
deps = ["ChainRulesCore", "Compat", "LinearAlgebra"]
git-tree-sha1 = "aff621f1f49e9262a34aaf0d57d02ea3b35aec60"
uuid = "1fad7336-0346-5a1a-a56f-a06ba010965b"
version = "0.1.3"

[[deps.LibCURL]]
deps = ["LibCURL_jll", "MozillaCACerts_jll"]
uuid = "b27032c2-a3e7-50c8-80cd-2d36dbcbfd21"
version = "0.6.4"

[[deps.LibCURL_jll]]
deps = ["Artifacts", "LibSSH2_jll", "Libdl", "MbedTLS_jll", "Zlib_jll", "nghttp2_jll"]
uuid = "deac9b47-8bc7-5906-a0fe-35ac56dc84c0"
version = "8.4.0+0"

[[deps.LibGit2]]
deps = ["Base64", "LibGit2_jll", "NetworkOptions", "Printf", "SHA"]
uuid = "76f85450-5226-5b5a-8eaa-529ad045b433"

[[deps.LibGit2_jll]]
deps = ["Artifacts", "LibSSH2_jll", "Libdl", "MbedTLS_jll"]
uuid = "e37daf67-58a4-590a-8e99-b0245dd2ffc5"
version = "1.6.4+0"

[[deps.LibSSH2_jll]]
deps = ["Artifacts", "Libdl", "MbedTLS_jll"]
uuid = "29816b5a-b9ab-546f-933c-edad1886dfa8"
version = "1.11.0+1"

[[deps.Libdl]]
uuid = "8f399da3-3557-5675-b5ff-fb832c97cbdb"

[[deps.LicenseCheck]]
deps = ["licensecheck_jll"]
git-tree-sha1 = "e98bc9e1f773123cfdb1fa3d7a4c898e1f030341"
uuid = "726dbf0d-6eb6-41af-b36c-cd770e0f00cc"
version = "0.2.2"

[[deps.LinearAlgebra]]
deps = ["Libdl", "OpenBLAS_jll", "libblastrampoline_jll"]
uuid = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"

[[deps.LocalRegistry]]
deps = ["Random", "RegistryInstances", "RegistryTools", "TOML", "UUIDs"]
git-tree-sha1 = "d01c44f41135b4e39656201ae77a95aeba9fe395"
uuid = "89398ba2-070a-4b16-a995-9893c55d93cf"
version = "0.5.6"

[[deps.LogExpFunctions]]
deps = ["DocStringExtensions", "IrrationalConstants", "LinearAlgebra"]
git-tree-sha1 = "18144f3e9cbe9b15b070288eef858f71b291ce37"
uuid = "2ab3a3ac-af41-5b50-aa03-7779005ae688"
version = "0.3.27"

    [deps.LogExpFunctions.extensions]
    LogExpFunctionsChainRulesCoreExt = "ChainRulesCore"
    LogExpFunctionsChangesOfVariablesExt = "ChangesOfVariables"
    LogExpFunctionsInverseFunctionsExt = "InverseFunctions"

    [deps.LogExpFunctions.weakdeps]
    ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
    ChangesOfVariables = "9e997f8a-9a97-42d5-a9f1-ce6bfc15e2c0"
    InverseFunctions = "3587e190-3f89-42d0-90ee-14403ec27112"

[[deps.Logging]]
uuid = "56ddb016-857b-54e1-b83d-db4d58db5568"

[[deps.LoggingExtras]]
deps = ["Dates", "Logging"]
git-tree-sha1 = "c1dd6d7978c12545b4179fb6153b9250c96b0075"
uuid = "e6f89c97-d47a-5376-807f-9c37f3926c36"
version = "1.0.3"

[[deps.LowRankApprox]]
deps = ["FFTW", "LinearAlgebra", "LowRankMatrices", "Nullables", "Random", "SparseArrays"]
git-tree-sha1 = "031af63ba945e23424815014ba0e59c28f5aed32"
uuid = "898213cb-b102-5a47-900c-97e73b919f73"
version = "0.5.5"

[[deps.LowRankMatrices]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "7c8664b2f3d5c3d9b77605c03d53b18813e79b0f"
uuid = "e65ccdef-c354-471a-8090-89bec1c20ec3"
version = "1.0.1"

    [deps.LowRankMatrices.extensions]
    LowRankMatricesFillArraysExt = "FillArrays"

    [deps.LowRankMatrices.weakdeps]
    FillArrays = "1a297f60-69ca-5386-bcde-b61e274b549b"

[[deps.MKL]]
deps = ["Artifacts", "Libdl", "LinearAlgebra", "MKL_jll", "PackageCompiler"]
git-tree-sha1 = "e8434bb06f4f5695111c8bb9b54ed2f4b8d3b30d"
uuid = "33e6dc65-8f57-5167-99aa-e5a354878fb2"
version = "0.4.3"

[[deps.MKL_jll]]
deps = ["Artifacts", "IntelOpenMP_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "Pkg"]
git-tree-sha1 = "2ce8695e1e699b68702c03402672a69f54b8aca9"
uuid = "856f044c-d86e-5d09-b602-aeab76dc8ba7"
version = "2022.2.0+0"

[[deps.MPICH_jll]]
deps = ["Artifacts", "CompilerSupportLibraries_jll", "Hwloc_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "MPIPreferences", "TOML"]
git-tree-sha1 = "656036b9ed6f942d35e536e249600bc31d0f9df8"
uuid = "7cb0a576-ebde-5e09-9194-50597f1243b4"
version = "4.2.0+0"

[[deps.MPIPreferences]]
deps = ["Libdl", "Preferences"]
git-tree-sha1 = "8f6af051b9e8ec597fa09d8885ed79fd582f33c9"
uuid = "3da0fdf6-3ccc-4f1b-acd9-58baa6c99267"
version = "0.1.10"

[[deps.MPItrampoline_jll]]
deps = ["Artifacts", "CompilerSupportLibraries_jll", "Hwloc_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "MPIPreferences", "TOML"]
git-tree-sha1 = "77c3bd69fdb024d75af38713e883d0f249ce19c2"
uuid = "f1f71cc9-e9ae-5b93-9b94-4fe0e1ad3748"
version = "5.3.2+0"

[[deps.MacroTools]]
deps = ["Markdown", "Random"]
git-tree-sha1 = "2fa9ee3e63fd3a4f7a9a4f4744a52f4856de82df"
uuid = "1914dd2f-81c6-5fcd-8719-6d5c9610ff09"
version = "0.5.13"

[[deps.Markdown]]
deps = ["Base64"]
uuid = "d6f4376e-aef5-505a-96c1-9c027394607a"

[[deps.MbedTLS]]
deps = ["Dates", "MbedTLS_jll", "MozillaCACerts_jll", "NetworkOptions", "Random", "Sockets"]
git-tree-sha1 = "c067a280ddc25f196b5e7df3877c6b226d390aaf"
uuid = "739be429-bea8-5141-9913-cc70e7f3736d"
version = "1.1.9"

[[deps.MbedTLS_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "c8ffd9c3-330d-5841-b78e-0817d7145fa1"
version = "2.28.2+1"

[[deps.Memoization]]
deps = ["MacroTools"]
git-tree-sha1 = "073f080e733bc6697411901224ed4fd15fefaffa"
uuid = "6fafb56a-5788-4b4e-91ca-c0cea6611c73"
version = "0.2.1"

[[deps.MetaGraphs]]
deps = ["Graphs", "JLD2", "Random"]
git-tree-sha1 = "1130dbe1d5276cb656f6e1094ce97466ed700e5a"
uuid = "626554b9-1ddb-594c-aa3c-2596fe9399a5"
version = "0.7.2"

[[deps.MicrosoftMPI_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
git-tree-sha1 = "f12a29c4400ba812841c6ace3f4efbb6dbb3ba01"
uuid = "9237b28f-5490-5468-be7b-bb81f5f5e6cf"
version = "10.1.4+2"

[[deps.Mmap]]
uuid = "a63ad114-7e13-5084-954f-fe012c677804"

[[deps.Mocking]]
deps = ["Compat", "ExprTools"]
git-tree-sha1 = "4cc0c5a83933648b615c36c2b956d94fda70641e"
uuid = "78c3b35d-d492-501b-9361-3d52fe80e533"
version = "0.7.7"

[[deps.MozillaCACerts_jll]]
uuid = "14a3606d-f60d-562e-9121-12d972cd8159"
version = "2023.1.10"

[[deps.NNlib]]
deps = ["Adapt", "Atomix", "ChainRulesCore", "GPUArraysCore", "KernelAbstractions", "LinearAlgebra", "Pkg", "Random", "Requires", "Statistics"]
git-tree-sha1 = "877f15c331337d54cf24c797d5bcb2e48ce21221"
uuid = "872c559c-99b0-510c-b3b7-b6c96a88d5cd"
version = "0.9.12"

    [deps.NNlib.extensions]
    NNlibAMDGPUExt = "AMDGPU"
    NNlibCUDACUDNNExt = ["CUDA", "cuDNN"]
    NNlibCUDAExt = "CUDA"
    NNlibEnzymeCoreExt = "EnzymeCore"

    [deps.NNlib.weakdeps]
    AMDGPU = "21141c5a-9bdb-4563-92ae-f87d6854732e"
    CUDA = "052768ef-5323-5732-b1bb-66c8b64840ba"
    EnzymeCore = "f151be2c-9106-41f4-ab19-57ee4f262869"
    cuDNN = "02a925ec-e4fe-4b08-9a7e-0d78e3d38ccd"

[[deps.NetworkOptions]]
uuid = "ca575930-c2e3-43a9-ace4-1e988b2c1908"
version = "1.2.0"

[[deps.Nullables]]
git-tree-sha1 = "8f87854cc8f3685a60689d8edecaa29d2251979b"
uuid = "4d1e1d77-625e-5b40-9113-a560ec7a8ecd"
version = "1.0.0"

[[deps.OpenBLAS_jll]]
deps = ["Artifacts", "CompilerSupportLibraries_jll", "Libdl"]
uuid = "4536629a-c528-5b80-bd46-f80d51c5b363"
version = "0.3.23+4"

[[deps.OpenLibm_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "05823500-19ac-5b8b-9628-191a04bc5112"
version = "0.8.1+2"

[[deps.OpenMPI_jll]]
deps = ["Artifacts", "CompilerSupportLibraries_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "MPIPreferences", "TOML"]
git-tree-sha1 = "e25c1778a98e34219a00455d6e4384e017ea9762"
uuid = "fe0851c0-eecd-5654-98d4-656369965a5c"
version = "4.1.6+0"

[[deps.OpenSSL]]
deps = ["BitFlags", "Dates", "MozillaCACerts_jll", "OpenSSL_jll", "Sockets"]
git-tree-sha1 = "af81a32750ebc831ee28bdaaba6e1067decef51e"
uuid = "4d8831e6-92b7-49fb-bdf8-b643e874388c"
version = "1.4.2"

[[deps.OpenSSL_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl"]
git-tree-sha1 = "60e3045590bd104a16fefb12836c00c0ef8c7f8c"
uuid = "458c3c95-2e84-50aa-8efc-19380b2a3a95"
version = "3.0.13+0"

[[deps.OpenSpecFun_jll]]
deps = ["Artifacts", "CompilerSupportLibraries_jll", "JLLWrappers", "Libdl", "Pkg"]
git-tree-sha1 = "13652491f6856acfd2db29360e1bbcd4565d04f1"
uuid = "efe28fd5-8261-553b-a9e1-b2916fc3738e"
version = "0.5.5+0"

[[deps.OrderedCollections]]
git-tree-sha1 = "dfdf5519f235516220579f949664f1bf44e741c5"
uuid = "bac558e1-5e72-5ebc-8fee-abe8a469f55d"
version = "1.6.3"

[[deps.PackageCompiler]]
deps = ["Artifacts", "LazyArtifacts", "Libdl", "Pkg", "RelocatableFolders", "UUIDs"]
git-tree-sha1 = "a16924b37299cc7d6106fac255b44a8c79c7c21f"
uuid = "9b87118b-4619-50d2-8e1e-99f35a4d4d9d"
version = "1.7.7"

[[deps.PackageExtensionCompat]]
git-tree-sha1 = "fb28e33b8a95c4cee25ce296c817d89cc2e53518"
uuid = "65ce6f38-6b18-4e1d-a461-8949797d7930"
version = "1.0.2"
weakdeps = ["Requires", "TOML"]

[[deps.Parsers]]
deps = ["Dates"]
git-tree-sha1 = "bfd7d8c7fd87f04543810d9cbd3995972236ba1b"
uuid = "69de0a69-1ddd-5017-9359-2bf0b02dc9f0"
version = "1.1.2"

[[deps.Pkg]]
deps = ["Artifacts", "Dates", "Downloads", "FileWatching", "LibGit2", "Libdl", "Logging", "Markdown", "Printf", "REPL", "Random", "SHA", "Serialization", "TOML", "Tar", "UUIDs", "p7zip_jll"]
uuid = "44cfe95a-1eb2-52ea-b672-e2afdf69b78f"
version = "1.10.0"

[[deps.PooledArrays]]
deps = ["DataAPI", "Future"]
git-tree-sha1 = "36d8b4b899628fb92c2749eb488d884a926614d3"
uuid = "2dfb63ee-cc39-5dd5-95bd-886bf059d720"
version = "1.4.3"

[[deps.PrecompileTools]]
deps = ["Preferences"]
git-tree-sha1 = "03b4c25b43cb84cee5c90aa9b5ea0a78fd848d2f"
uuid = "aea7be01-6a6a-4083-8856-8a6e6704d82a"
version = "1.2.0"

[[deps.Preferences]]
deps = ["TOML"]
git-tree-sha1 = "9306f6085165d270f7e3db02af26a400d580f5c6"
uuid = "21216c6a-2e73-6563-6e65-726566657250"
version = "1.4.3"

[[deps.Printf]]
deps = ["Unicode"]
uuid = "de0858da-6303-5e67-8744-51eddeeeb8d7"

[[deps.ProgressMeter]]
deps = ["Distributed", "Printf"]
git-tree-sha1 = "763a8ceb07833dd51bb9e3bbca372de32c0605ad"
uuid = "92933f4c-e287-5a05-a399-4b506db050ca"
version = "1.10.0"

[[deps.REPL]]
deps = ["InteractiveUtils", "Markdown", "Sockets", "Unicode"]
uuid = "3fa0cd96-eef1-5676-8a61-b3b8758bbffb"

[[deps.Random]]
deps = ["SHA"]
uuid = "9a3f8284-a2c9-5f02-9a11-845980a1fd5c"

[[deps.Random123]]
deps = ["Random", "RandomNumbers"]
git-tree-sha1 = "4743b43e5a9c4a2ede372de7061eed81795b12e7"
uuid = "74087812-796a-5b5d-8853-05524746bad3"
version = "1.7.0"

[[deps.RandomNumbers]]
deps = ["Random", "Requires"]
git-tree-sha1 = "043da614cc7e95c703498a491e2c21f58a2b8111"
uuid = "e6cf234a-135c-5ec9-84dd-332b85af5143"
version = "1.5.3"

[[deps.RecipesBase]]
deps = ["PrecompileTools"]
git-tree-sha1 = "5c3d09cc4f31f5fc6af001c250bf1278733100ff"
uuid = "3cdcf5f2-1ef4-517c-9805-6587b60abb01"
version = "1.3.4"

[[deps.Reexport]]
git-tree-sha1 = "45e428421666073eab6f2da5c9d310d99bb12f9b"
uuid = "189a3867-3050-52da-a836-e630ba90ab69"
version = "1.2.2"

[[deps.RegistryCI]]
deps = ["Base64", "Dates", "GitHub", "HTTP", "JSON", "LibGit2", "LicenseCheck", "Pkg", "Printf", "Random", "RegistryTools", "SHA", "StringDistances", "TOML", "Tar", "Test", "TimeZones", "VisualStringDistances"]
git-tree-sha1 = "f2719e7e9ddbb8137531f93599222ca0a103c18f"
uuid = "0c95cc5f-2f7e-43fe-82dd-79dbcba86b32"
version = "10.0.1"

[[deps.RegistryInstances]]
deps = ["LazilyInitializedFields", "Pkg", "TOML", "Tar"]
git-tree-sha1 = "ffd19052caf598b8653b99404058fce14828be51"
uuid = "2792f1a3-b283-48e8-9a74-f99dce5104f3"
version = "0.1.0"

[[deps.RegistryTools]]
deps = ["AutoHashEquals", "LibGit2", "Pkg", "SHA", "UUIDs"]
git-tree-sha1 = "3dd9eaa965a2925b0a34d994b4d886d797f54b20"
uuid = "d1eb7eb1-105f-429d-abf5-b0f65cb9e2c4"
version = "2.2.3"

[[deps.RelocatableFolders]]
deps = ["SHA", "Scratch"]
git-tree-sha1 = "cdbd3b1338c72ce29d9584fdbe9e9b70eeb5adca"
uuid = "05181044-ff0b-4ac5-8273-598c1e38db00"
version = "0.1.3"

[[deps.Requires]]
deps = ["UUIDs"]
git-tree-sha1 = "838a3a4188e2ded87a4f9f184b4b0d78a1e91cb7"
uuid = "ae029012-a4dd-5104-9daa-d747884805df"
version = "1.3.0"

[[deps.SHA]]
uuid = "ea8e919c-243c-51af-8825-aaa63cd721ce"
version = "0.7.0"

[[deps.Scratch]]
deps = ["Dates"]
git-tree-sha1 = "3bac05bc7e74a75fd9cba4295cde4045d9fe2386"
uuid = "6c6a2e73-6563-6170-7368-637461726353"
version = "1.2.1"

[[deps.SentinelArrays]]
deps = ["Dates", "Random"]
git-tree-sha1 = "0e7508ff27ba32f26cd459474ca2ede1bc10991f"
uuid = "91c51154-3ec4-41a3-a24f-3f23e20d615c"
version = "1.4.1"

[[deps.Serialization]]
uuid = "9e88b42a-f829-5b0c-bbe9-9e923198166b"

[[deps.SharedArrays]]
deps = ["Distributed", "Mmap", "Random", "Serialization"]
uuid = "1a1011a3-84de-559e-8e89-a11a2f7dc383"

[[deps.SimpleBufferStream]]
git-tree-sha1 = "874e8867b33a00e784c8a7e4b60afe9e037b74e1"
uuid = "777ac1f9-54b0-4bf8-805c-2214025038e7"
version = "1.1.0"

[[deps.SimpleTraits]]
deps = ["InteractiveUtils", "MacroTools"]
git-tree-sha1 = "5d7e3f4e11935503d3ecaf7186eac40602e7d231"
uuid = "699a6c99-e7fa-54fc-8d76-47d257e15c1d"
version = "0.9.4"

[[deps.Sockets]]
uuid = "6462fe0b-24de-5631-8697-dd941f90decc"

[[deps.SodiumSeal]]
deps = ["Base64", "Libdl", "libsodium_jll"]
git-tree-sha1 = "80cef67d2953e33935b41c6ab0a178b9987b1c99"
uuid = "2133526b-2bfb-4018-ac12-889fb3908a75"
version = "0.1.1"

[[deps.SparseArrays]]
deps = ["Libdl", "LinearAlgebra", "Random", "Serialization", "SuiteSparse_jll"]
uuid = "2f01184e-e22b-5df5-ae63-d93ebab69eaf"
version = "1.10.0"

[[deps.SpecialFunctions]]
deps = ["IrrationalConstants", "LogExpFunctions", "OpenLibm_jll", "OpenSpecFun_jll"]
git-tree-sha1 = "e2cfc4012a19088254b3950b85c3c1d8882d864d"
uuid = "276daf66-3868-5448-9aa4-cd146d93841b"
version = "2.3.1"
weakdeps = ["ChainRulesCore"]

    [deps.SpecialFunctions.extensions]
    SpecialFunctionsChainRulesCoreExt = "ChainRulesCore"

[[deps.SpinGlassEngine]]
deps = ["CUDA", "DocStringExtensions", "Graphs", "LabelledGraphs", "LinearAlgebra", "MKL", "Memoization", "MetaGraphs", "NNlib", "ProgressMeter", "SpinGlassNetworks", "SpinGlassTensors", "Statistics", "TensorCast", "TensorOperations"]
path = "../../../new-zksi-repo/SpinGlassEngine.jl"
uuid = "0563570f-ea1b-4080-8a64-041ac6565a4e"
version = "1.0.0"

[[deps.SpinGlassNetworks]]
deps = ["CSV", "CUDA", "DocStringExtensions", "Graphs", "HDF5", "JLD2", "LabelledGraphs", "LinearAlgebra", "MKL", "MetaGraphs", "SparseArrays", "SpinGlassTensors", "TensorCast"]
path = "../../../new-zksi-repo/SpinGlassNetworks.jl"
uuid = "b7f6bd3e-55dc-4da6-96a9-ef9dbec6ac19"
version = "1.0.0"

[[deps.SpinGlassTensors]]
deps = ["CUDA", "DocStringExtensions", "LinearAlgebra", "LowRankApprox", "MKL", "Memoization", "NNlib", "SparseArrays", "TSVD", "TensorCast", "TensorOperations", "TransmuteDims", "cuTENSOR"]
path = "../../../new-zksi-repo/SpinGlassTensors.jl"
uuid = "7584fc6a-5a23-4eeb-8277-827aab0146ea"
version = "1.0.0"

[[deps.StaticArrays]]
deps = ["LinearAlgebra", "PrecompileTools", "Random", "StaticArraysCore"]
git-tree-sha1 = "bf074c045d3d5ffd956fa0a461da38a44685d6b2"
uuid = "90137ffa-7385-5640-81b9-e52037218182"
version = "1.9.3"
weakdeps = ["ChainRulesCore", "Statistics"]

    [deps.StaticArrays.extensions]
    StaticArraysChainRulesCoreExt = "ChainRulesCore"
    StaticArraysStatisticsExt = "Statistics"

[[deps.StaticArraysCore]]
git-tree-sha1 = "36b3d696ce6366023a0ea192b4cd442268995a0d"
uuid = "1e83bf80-4336-4d27-bf5d-d5a4f845583c"
version = "1.4.2"

[[deps.Statistics]]
deps = ["LinearAlgebra", "SparseArrays"]
uuid = "10745b16-79ce-11e8-11f9-7d13ad32a3b2"
version = "1.10.0"

[[deps.StatsAPI]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "1ff449ad350c9c4cbc756624d6f8a8c3ef56d3ed"
uuid = "82ae8749-77ed-4fe6-ae5f-f523153014b0"
version = "1.7.0"

[[deps.Strided]]
deps = ["LinearAlgebra", "StridedViews", "TupleTools"]
git-tree-sha1 = "40c69be0e1b72ee2f42923b7d1ff13e0b04e675c"
uuid = "5e0ebb24-38b0-5f93-81fe-25c709ecae67"
version = "2.0.4"

[[deps.StridedViews]]
deps = ["LinearAlgebra", "PackageExtensionCompat"]
git-tree-sha1 = "5b765c4e401693ab08981989f74a36a010aa1d8e"
uuid = "4db3bf67-4bd7-4b4e-b153-31dc3fb37143"
version = "0.2.2"
weakdeps = ["CUDA"]

    [deps.StridedViews.extensions]
    StridedViewsCUDAExt = "CUDA"

[[deps.StringDistances]]
deps = ["Distances", "StatsAPI"]
git-tree-sha1 = "ceeef74797d961aee825aabf71446d6aba898acb"
uuid = "88034a9c-02f8-509d-84a9-84ec65e18404"
version = "0.11.2"

[[deps.SuiteSparse_jll]]
deps = ["Artifacts", "Libdl", "libblastrampoline_jll"]
uuid = "bea87d4a-7f5b-5778-9afe-8cc45184846c"
version = "7.2.1+1"

[[deps.TOML]]
deps = ["Dates"]
uuid = "fa267f1f-6049-4f14-aa54-33bafae1ed76"
version = "1.0.3"

[[deps.TSVD]]
deps = ["Adapt", "LinearAlgebra"]
git-tree-sha1 = "c39caef6bae501e5607a6caf68dd9ac6e8addbcb"
uuid = "9449cd9e-2762-5aa3-a617-5413e99d722e"
version = "0.4.4"

[[deps.TableTraits]]
deps = ["IteratorInterfaceExtensions"]
git-tree-sha1 = "c06b2f539df1c6efa794486abfb6ed2022561a39"
uuid = "3783bdb8-4a98-5b6b-af9a-565f29a5fe9c"
version = "1.0.1"

[[deps.Tables]]
deps = ["DataAPI", "DataValueInterfaces", "IteratorInterfaceExtensions", "LinearAlgebra", "OrderedCollections", "TableTraits"]
git-tree-sha1 = "cb76cf677714c095e535e3501ac7954732aeea2d"
uuid = "bd369af6-aec1-5ad0-b16a-f7cc5008161c"
version = "1.11.1"

[[deps.Tar]]
deps = ["ArgTools", "SHA"]
uuid = "a4e569a6-e804-4fa4-b0f3-eef7a1d5b13e"
version = "1.10.0"

[[deps.TensorCast]]
deps = ["ChainRulesCore", "Compat", "LazyStack", "LinearAlgebra", "MacroTools", "Random", "StaticArrays", "TransmuteDims"]
git-tree-sha1 = "88423a9e2a1eb7fb2e8c4dd7ede52e28bc5769eb"
uuid = "02d47bb6-7ce6-556a-be16-bb1710789e2b"
version = "0.4.6"

[[deps.TensorOperations]]
deps = ["LRUCache", "LinearAlgebra", "PackageExtensionCompat", "Strided", "StridedViews", "TupleTools", "VectorInterface"]
git-tree-sha1 = "59bcc1e51aa8c7489fa64d4cab0c4b0202edfc0e"
uuid = "6aa20fa7-93e2-5fca-9bc0-fbd0db3c71a2"
version = "4.1.0"
weakdeps = ["CUDA", "ChainRulesCore", "cuTENSOR"]

    [deps.TensorOperations.extensions]
    TensorOperationsChainRulesCoreExt = "ChainRulesCore"
    TensorOperationscuTENSORExt = ["cuTENSOR", "CUDA"]

[[deps.Test]]
deps = ["InteractiveUtils", "Logging", "Random", "Serialization"]
uuid = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[[deps.TimeZones]]
deps = ["Dates", "Future", "LazyArtifacts", "Mocking", "Pkg", "Printf", "RecipesBase", "Serialization", "Unicode"]
git-tree-sha1 = "a5688ffdbd849a98503c6650effe79fe89a41252"
uuid = "f269a46b-ccf7-5d73-abea-4c690281aa53"
version = "1.5.9"

[[deps.TimerOutputs]]
deps = ["ExprTools", "Printf"]
git-tree-sha1 = "f548a9e9c490030e545f72074a41edfd0e5bcdd7"
uuid = "a759f4b9-e2f1-59dc-863e-4aeb61b1ea8f"
version = "0.5.23"

[[deps.Tokenize]]
git-tree-sha1 = "5b5a892ba7704c0977013bd0f9c30f5d962181e0"
uuid = "0796e94c-ce3b-5d07-9a54-7f471281c624"
version = "0.5.28"

[[deps.TranscodingStreams]]
git-tree-sha1 = "3caa21522e7efac1ba21834a03734c57b4611c7e"
uuid = "3bb67fe8-82b1-5028-8e26-92a6c54297fa"
version = "0.10.4"
weakdeps = ["Random", "Test"]

    [deps.TranscodingStreams.extensions]
    TestExt = ["Test", "Random"]

[[deps.TransmuteDims]]
deps = ["Adapt", "ChainRulesCore", "GPUArraysCore", "LinearAlgebra", "Requires", "Strided"]
git-tree-sha1 = "5b6f1f2ba5e91983eabc47cb362f92d9a96b579f"
repo-rev = "strided2"
repo-url = "https://github.com/mcabbott/TransmuteDims.jl.git"
uuid = "24ddb15e-299a-5cc3-8414-dbddc482d9ca"
version = "0.1.16"

[[deps.TupleTools]]
git-tree-sha1 = "41d61b1c545b06279871ef1a4b5fcb2cac2191cd"
uuid = "9d95972d-f1c8-5527-a6e0-b4b365fa01f6"
version = "1.5.0"

[[deps.URIs]]
git-tree-sha1 = "67db6cc7b3821e19ebe75791a9dd19c9b1188f2b"
uuid = "5c2747f8-b7ea-4ff2-ba2e-563bfd36b1d4"
version = "1.5.1"

[[deps.UUIDs]]
deps = ["Random", "SHA"]
uuid = "cf7118a7-6976-5b1a-9a39-7adc72f591a4"

[[deps.UnbalancedOptimalTransport]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "9af2572277f564035f452ec82c56de795f523c45"
uuid = "6f61b460-fd45-461a-bdf7-98edd72e362f"
version = "0.2.1"

[[deps.Unicode]]
uuid = "4ec0a83e-493e-50e2-b9ac-8f72acf5a8f5"

[[deps.UnsafeAtomics]]
git-tree-sha1 = "6331ac3440856ea1988316b46045303bef658278"
uuid = "013be700-e6cd-48c3-b4a1-df204f14c38f"
version = "0.2.1"

[[deps.UnsafeAtomicsLLVM]]
deps = ["LLVM", "UnsafeAtomics"]
git-tree-sha1 = "323e3d0acf5e78a56dfae7bd8928c989b4f3083e"
uuid = "d80eeb9a-aca5-4d75-85e5-170c8b632249"
version = "0.1.3"

[[deps.VectorInterface]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "ed8f91274e744e5030c349e76fa98bf68236766f"
uuid = "409d34a3-91d5-4945-b6ec-7529ddf182d8"
version = "0.4.4"

[[deps.VisualStringDistances]]
deps = ["DelimitedFiles", "LinearAlgebra", "StaticArrays", "UnbalancedOptimalTransport"]
git-tree-sha1 = "0b2c7d9d5c16629f165d03b8769a07ffd44c6ad7"
uuid = "089bb0c6-1854-47b9-96f7-327dbbe09dca"
version = "0.1.1"

[[deps.Zlib_jll]]
deps = ["Libdl"]
uuid = "83775a58-1f1d-513f-b197-d71354ab007a"
version = "1.2.13+1"

[[deps.cuTENSOR]]
deps = ["CEnum", "CUDA", "CUTENSOR_jll", "LinearAlgebra"]
git-tree-sha1 = "c5c63b96b5fb0c7133145eb967882142420e6272"
uuid = "011b41b2-24ef-40a8-b3eb-fa098493e9e1"
version = "1.1.0"

[[deps.libaec_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl"]
git-tree-sha1 = "46bf7be2917b59b761247be3f317ddf75e50e997"
uuid = "477f73a3-ac25-53e9-8cc3-50b2fa2566f0"
version = "1.1.2+0"

[[deps.libblastrampoline_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "8e850b90-86db-534c-a0d3-1478176c7d93"
version = "5.8.0+1"

[[deps.libsodium_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
git-tree-sha1 = "848ab3d00fe39d6fbc2a8641048f8f272af1c51e"
uuid = "a9144af2-ca23-56d9-984f-0d03f7b5ccf8"
version = "1.0.20+0"

[[deps.licensecheck_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
git-tree-sha1 = "b790ad21ac235c39c0eb34214ccf3d5f5ea60efa"
uuid = "4ecb348a-8b88-51ea-b912-4c460483ee91"
version = "0.3.101+0"

[[deps.nghttp2_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "8e850ede-7688-5339-a07c-302acd2aaf8d"
version = "1.52.0+1"

[[deps.p7zip_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "3f19e933-33d8-53b3-aaab-bd5110c3b7a0"
version = "17.4.0+2"

Expected behavior

Correct 1x1 matrix multiplication.

Version info

Details on Julia:

julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 20 × 12th Gen Intel(R) Core(TM) i7-12700KF
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, alderlake)
Threads: 1 default, 0 interactive, 1 GC (on 20 virtual cores)

Details on CUDA:

julia> CUDA.versioninfo()
CUDA runtime 12.1, artifact installation
CUDA driver 12.0
NVIDIA driver 525.147.5

CUDA libraries: 
- CUBLAS: 12.1.3
- CURAND: 10.3.2
- CUFFT: 11.0.2
- CUSOLVER: 11.4.5
- CUSPARSE: 12.1.0
- CUPTI: 18.0.0
- NVML: 12.0.0+525.147.5

Julia packages: 
- CUDA: 4.4.1
- CUDA_Driver_jll: 0.5.0+1
- CUDA_Runtime_jll: 0.6.0+0

Toolchain:
- Julia: 1.10.2
- LLVM: 15.0.7
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: NVIDIA GeForce RTX 3080 (sm_86, 7.963 GiB / 10.000 GiB available)
@lpawela lpawela added the bug Something isn't working label Mar 20, 2024
@lpawela
Copy link
Contributor Author

lpawela commented Mar 20, 2024

I also tried reproducing on some different systems, and the same errors occur on:

julia> CUDA.versioninfo()
CUDA runtime 12.3, artifact installation
CUDA driver 12.3
NVIDIA driver 545.23.6

CUDA libraries: 
- CUBLAS: 12.3.4
- CURAND: 10.3.4
- CUFFT: 11.0.12
- CUSOLVER: 11.5.4
- CUSPARSE: 12.2.0
- CUPTI: 21.0.0
- NVML: 12.0.0+545.23.6

Julia packages: 
- CUDA: 5.2.0
- CUDA_Driver_jll: 0.7.0+1
- CUDA_Runtime_jll: 0.11.1+0

Toolchain:
- Julia: 1.10.2
- LLVM: 15.0.7

2 devices:
  0: NVIDIA RTX A6000 (sm_86, 44.256 GiB / 44.988 GiB available)
  1: NVIDIA RTX A6000 (sm_86, 44.548 GiB / 44.988 GiB available)

julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 32 × 13th Gen Intel(R) Core(TM) i9-13900K
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, goldmont)
Threads: 1 default, 0 interactive, 1 GC (on 32 virtual cores)

Strangely, on this system

julia> versioninfo()
Julia Version 1.8.5
Commit 17cfb8e65ea (2023-01-08 06:45 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 36 × Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake-avx512)
  Threads: 1 on 36 virtual cores

julia> CUDA.versioninfo()
CUDA runtime 11.8, artifact installation
CUDA driver 11.4
NVIDIA driver 470.182.3

CUDA libraries: 
- CUBLAS: 11.11.3
- CURAND: 10.3.0
- CUFFT: 10.9.0
- CUSOLVER: 11.4.1
- CUSPARSE: 11.7.5
- CUPTI: 18.0.0
- NVML: 11.0.0+470.182.3

Julia packages: 
- CUDA.jl: 5.1.2
- CUDA_Driver_jll: 0.7.0+1
- CUDA_Runtime_jll: 0.10.1+0
- CUDA_Runtime_Discovery: 0.2.3

Toolchain:
- Julia: 1.8.5
- LLVM: 13.0.1

2 devices:
  0: NVIDIA TITAN RTX (sm_75, 23.124 GiB / 23.653 GiB available)
  1: NVIDIA TITAN RTX (sm_75, 23.647 GiB / 23.650 GiB available)

the CSC examples pass, the CSR one fails. After upgrade to newer Julia and CUDA.jl

julia> CUDA.versioninfo()
CUDA runtime 11.8, artifact installation
CUDA driver 11.4
NVIDIA driver 470.182.3

CUDA libraries: 
- CUBLAS: 11.11.3
- CURAND: 10.3.0
- CUFFT: 10.9.0
- CUSOLVER: 11.4.1
- CUSPARSE: 11.7.5
- CUPTI: 18.0.0
- NVML: 11.0.0+470.182.3

Julia packages: 
- CUDA: 5.2.0
- CUDA_Driver_jll: 0.7.0+1
- CUDA_Runtime_jll: 0.11.1+0

Toolchain:
- Julia: 1.10.2
- LLVM: 15.0.7

2 devices:
  0: NVIDIA TITAN RTX (sm_75, 23.124 GiB / 23.653 GiB available)
  1: NVIDIA TITAN RTX (sm_75, 23.647 GiB / 23.650 GiB available)

julia> versioninfo()
Julia Version 1.10.2
Commit bd47eca2c8a (2024-03-01 10:14 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 36 × Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, skylake-avx512)
Threads: 1 default, 0 interactive, 1 GC (on 36 virtual cores)

the behavior stays the same - CSC works, CSR does not. Seems that with CUDA 11 the CSC examples work and fail with CUDA 12.

@amontoison
Copy link
Member

amontoison commented Mar 20, 2024

@lpawela
I suspect that the matrices are so small that the CUDA routine that provides the size of the buffer doesn't modify the variable that stores the size because we need an empty buffer.
We already observed this issue with another CUSPARSE routine:
https://github.com/JuliaGPU/CUDA.jl/blob/master/lib/cusparse/conversions.jl#L206.

An hotfix is to specify an initial value 0 for the size of the buffer here.
If we don't specify the initial value, it's a random number and it could represent 113Tb...

@lpawela
Copy link
Contributor Author

lpawela commented Mar 20, 2024

@amontoison setting out = Ref{Csize_t}(UInt64(0)::Csize_t) results in the same errors.

@lpawela
Copy link
Contributor Author

lpawela commented Mar 20, 2024

@amontoison I went through the code setting the buffers to zero in some places (master...lpawela:CUDA.jl:lp/sparse-buffer-size). Now I get the following errors

julia> sparse32csc * dense32 # ERROR
ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
 [1] throw_api_error(res::CUDA.cudaError_enum)
   @ CUDA ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:30
 [2] nonblocking_synchronize(val::CuContext)
   @ CUDA ~/lib/CUDA.jl/lib/cudadrv/synchronization.jl:174
 [3] device_synchronize(; blocking::Bool, spin::Bool)
   @ CUDA ~/lib/CUDA.jl/lib/cudadrv/synchronization.jl:185
 [4] device_synchronize
   @ ~/lib/CUDA.jl/lib/cudadrv/synchronization.jl:180 [inlined]
 [5] maybe_synchronize_cuda()
   @ CUDA ~/lib/CUDA.jl/src/initialization.jl:217
 [6] top-level scope
   @ ~/lib/CUDA.jl/src/initialization.jl:208

julia> dense32 * sparse32csc # NO ERROR
ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
 [1] throw_api_error(res::CUDA.cudaError_enum)
   @ CUDA ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:30
 [2] isdone
   @ ~/lib/CUDA.jl/lib/cudadrv/stream.jl:111 [inlined]
 [3] spinning_synchronization(f::typeof(CUDA.isdone), obj::CuStream)
   @ CUDA ~/lib/CUDA.jl/lib/cudadrv/synchronization.jl:79
 [4] device_synchronize(; blocking::Bool, spin::Bool)
   @ CUDA ~/lib/CUDA.jl/lib/cudadrv/synchronization.jl:182
 [5] device_synchronize
   @ ~/lib/CUDA.jl/lib/cudadrv/synchronization.jl:180 [inlined]
 [6] maybe_synchronize_cuda()
   @ CUDA ~/lib/CUDA.jl/src/initialization.jl:217
 [7] top-level scope
   @ ~/lib/CUDA.jl/src/initialization.jl:208

caused by: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
  [1] throw_api_error(res::CUDA.cudaError_enum)
    @ CUDA ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:30
  [2] check
    @ ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:37 [inlined]
  [3] cuMemAllocFromPoolAsync
    @ ~/lib/CUDA.jl/lib/utils/call.jl:30 [inlined]
  [4] #alloc#1
    @ ~/lib/CUDA.jl/lib/cudadrv/memory.jl:81 [inlined]
  [5] alloc
    @ ~/lib/CUDA.jl/lib/cudadrv/memory.jl:71 [inlined]
  [6] actual_alloc(bytes::Int64; async::Bool, stream::CuStream, pool::CuMemoryPool)
    @ CUDA ~/lib/CUDA.jl/src/pool.jl:66
  [7] actual_alloc
    @ ~/lib/CUDA.jl/src/pool.jl:59 [inlined]
  [8] #1060
    @ ~/lib/CUDA.jl/src/pool.jl:453 [inlined]
  [9] retry_reclaim
    @ ~/lib/CUDA.jl/src/pool.jl:370 [inlined]
 [10] macro expansion
    @ ~/lib/CUDA.jl/src/pool.jl:452 [inlined]
 [11] macro expansion
    @ ./timing.jl:395 [inlined]
 [12] #_alloc#1059
    @ ~/lib/CUDA.jl/src/pool.jl:448 [inlined]
 [13] _alloc
    @ ~/lib/CUDA.jl/src/pool.jl:444 [inlined]
 [14] #alloc#1058
    @ ~/lib/CUDA.jl/src/pool.jl:434 [inlined]
 [15] alloc
    @ ~/lib/CUDA.jl/src/pool.jl:428 [inlined]
 [16] CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}(::UndefInitializer, dims::Tuple{Int64, Int64})
    @ CUDA ~/lib/CUDA.jl/src/array.jl:74
 [17] CuArray
    @ ~/lib/CUDA.jl/src/array.jl:147 [inlined]
 [18] CuArray
    @ ~/lib/CUDA.jl/src/array.jl:162 [inlined]
 [19] *(A::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}, B::CUDA.CUSPARSE.CuSparseMatrixCSC{Float32, Int32})
    @ CUDA.CUSPARSE ~/lib/CUDA.jl/lib/cusparse/interfaces.jl:129
 [20] top-level scope
    @ REPL[9]:1
 [21] top-level scope
    @ ~/lib/CUDA.jl/src/initialization.jl:206

julia> (sparse32csc' * dense32')' # ERROR
WARNING: Error while freeing DeviceBuffer(4 bytes at 0x0000000302001800):
CUDA.CuError(code=CUDA.cudaError_enum(0x000002bc), details=CUDA.Optional{String}(data=nothing))

Stacktrace:
  [1] throw_api_error(res::CUDA.cudaError_enum)
    @ CUDA ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:30
  [2] check
    @ ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:37 [inlined]
  [3] cuMemFreeAsync
    @ ~/lib/CUDA.jl/lib/utils/call.jl:30 [inlined]
  [4] free(buf::CUDA.Mem.DeviceBuffer; stream::CuStream)
    @ CUDA.Mem ~/lib/CUDA.jl/lib/cudadrv/memory.jl:97
  [5] free
    @ ~/lib/CUDA.jl/lib/cudadrv/memory.jl:92 [inlined]
  [6] #actual_free#1042
    @ ~/lib/CUDA.jl/src/pool.jl:78 [inlined]
  [7] actual_free
    @ ~/lib/CUDA.jl/src/pool.jl:75 [inlined]
  [8] #_free#1067
    @ ~/lib/CUDA.jl/src/pool.jl:523 [inlined]
  [9] _free
    @ ~/lib/CUDA.jl/src/pool.jl:510 [inlined]
 [10] macro expansion
    @ ~/lib/CUDA.jl/src/pool.jl:495 [inlined]
 [11] macro expansion
    @ ./timing.jl:395 [inlined]
 [12] #free#1066
    @ ~/lib/CUDA.jl/src/pool.jl:494 [inlined]
 [13] free
    @ ~/lib/CUDA.jl/src/pool.jl:483 [inlined]
 [14] (::CUDA.var"#1073#1074"{CUDA.Mem.DeviceBuffer, Bool})()
    @ CUDA ~/lib/CUDA.jl/src/array.jl:101
 [15] #context!#954
    @ ~/lib/CUDA.jl/lib/cudadrv/state.jl:170 [inlined]
 [16] context!
    @ ~/lib/CUDA.jl/lib/cudadrv/state.jl:165 [inlined]
 [17] _free_buffer(buf::CUDA.Mem.DeviceBuffer, early::Bool)
    @ CUDA ~/lib/CUDA.jl/src/array.jl:89
 [18] release(rc::GPUArrays.RefCounted{CUDA.Mem.DeviceBuffer}, args::Bool)
    @ GPUArrays ~/.julia/packages/GPUArrays/Hd5Sk/src/host/abstractarray.jl:42
 [19] unsafe_free!
    @ ~/.julia/packages/GPUArrays/Hd5Sk/src/host/abstractarray.jl:91 [inlined]
 [20] unsafe_finalize!(xs::CuArray{Float32, 2, CUDA.Mem.DeviceBuffer})
    @ CUDA ~/lib/CUDA.jl/src/array.jl:113
 [21] top-level scope
    @ REPL[10]:1
 [22] top-level scope
    @ ~/lib/CUDA.jl/src/initialization.jl:206
 [23] eval
    @ ./boot.jl:385 [inlined]
 [24] eval_user_input(ast::Any, backend::REPL.REPLBackend, mod::Module)
    @ REPL ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:150
 [25] repl_backend_loop(backend::REPL.REPLBackend, get_module::Function)
    @ REPL ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:246
 [26] start_repl_backend(backend::REPL.REPLBackend, consumer::Any; get_module::Function)
    @ REPL ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:231
 [27] run_repl(repl::REPL.AbstractREPL, consumer::Any; backend_on_current_task::Bool, backend::Any)
    @ REPL ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:389
 [28] run_repl(repl::REPL.AbstractREPL, consumer::Any)
    @ REPL ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/REPL/src/REPL.jl:375
 [29] (::Base.var"#1013#1015"{Bool, Bool, Bool})(REPL::Module)
    @ Base ./client.jl:432
 [30] #invokelatest#2
    @ ./essentials.jl:892 [inlined]
 [31] invokelatest
    @ ./essentials.jl:889 [inlined]
 [32] run_main_repl(interactive::Bool, quiet::Bool, banner::Bool, history_file::Bool, color_set::Bool)
    @ Base ./client.jl:416
 [33] exec_options(opts::Base.JLOptions)
    @ Base ./client.jl:333
 [34] _start()
    @ Base ./client.jl:552
ERROR: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
 [1] throw_api_error(res::CUDA.cudaError_enum)
   @ CUDA ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:30
 [2] isdone
   @ ~/lib/CUDA.jl/lib/cudadrv/stream.jl:111 [inlined]
 [3] spinning_synchronization(f::typeof(CUDA.isdone), obj::CuStream)
   @ CUDA ~/lib/CUDA.jl/lib/cudadrv/synchronization.jl:79
 [4] device_synchronize(; blocking::Bool, spin::Bool)
   @ CUDA ~/lib/CUDA.jl/lib/cudadrv/synchronization.jl:182
 [5] device_synchronize
   @ ~/lib/CUDA.jl/lib/cudadrv/synchronization.jl:180 [inlined]
 [6] maybe_synchronize_cuda()
   @ CUDA ~/lib/CUDA.jl/src/initialization.jl:217
 [7] top-level scope
   @ ~/lib/CUDA.jl/src/initialization.jl:208

caused by: CUDA error: an illegal memory access was encountered (code 700, ERROR_ILLEGAL_ADDRESS)
Stacktrace:
  [1] throw_api_error(res::CUDA.cudaError_enum)
    @ CUDA ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:30
  [2] check
    @ ~/lib/CUDA.jl/lib/cudadrv/libcuda.jl:37 [inlined]
  [3] cuMemAllocFromPoolAsync
    @ ~/lib/CUDA.jl/lib/utils/call.jl:30 [inlined]
  [4] #alloc#1
    @ ~/lib/CUDA.jl/lib/cudadrv/memory.jl:81 [inlined]
  [5] alloc
    @ ~/lib/CUDA.jl/lib/cudadrv/memory.jl:71 [inlined]
  [6] actual_alloc(bytes::Int64; async::Bool, stream::CuStream, pool::CuMemoryPool)
    @ CUDA ~/lib/CUDA.jl/src/pool.jl:66
  [7] actual_alloc
    @ ~/lib/CUDA.jl/src/pool.jl:59 [inlined]
  [8] #1060
    @ ~/lib/CUDA.jl/src/pool.jl:453 [inlined]
  [9] retry_reclaim
    @ ~/lib/CUDA.jl/src/pool.jl:370 [inlined]
 [10] macro expansion
    @ ~/lib/CUDA.jl/src/pool.jl:452 [inlined]
 [11] macro expansion
    @ ./timing.jl:395 [inlined]
 [12] #_alloc#1059
    @ ~/lib/CUDA.jl/src/pool.jl:448 [inlined]
 [13] _alloc
    @ ~/lib/CUDA.jl/src/pool.jl:444 [inlined]
 [14] #alloc#1058
    @ ~/lib/CUDA.jl/src/pool.jl:434 [inlined]
 [15] alloc
    @ ~/lib/CUDA.jl/src/pool.jl:428 [inlined]
 [16] CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}(::UndefInitializer, dims::Tuple{Int64, Int64})
    @ CUDA ~/lib/CUDA.jl/src/array.jl:74
 [17] similar
    @ ~/lib/CUDA.jl/src/array.jl:196 [inlined]
 [18] similar
    @ ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/adjtrans.jl:361 [inlined]
 [19] *(A::LinearAlgebra.Adjoint{Float32, CUDA.CUSPARSE.CuSparseMatrixCSC{Float32, Int32}}, B::LinearAlgebra.Adjoint{Float32, CuArray{Float32, 2, CUDA.Mem.DeviceBuffer}})
    @ LinearAlgebra ~/.julia/juliaup/julia-1.10.2+0.x64.linux.gnu/share/julia/stdlib/v1.10/LinearAlgebra/src/matmul.jl:106
 [20] top-level scope
    @ REPL[10]:1
 [21] top-level scope
    @ ~/lib/CUDA.jl/src/initialization.jl:206

@amontoison
Copy link
Member

amontoison commented Mar 20, 2024

Can you try with a default buffer size 10000 instead of 0?
Just to check if we still have an error or not.
It seems that we still need a buffer but the size of the buffer is not updated for these small matrices.

@lpawela
Copy link
Contributor Author

lpawela commented Mar 20, 2024

@amontoison Yes, this works, thank you. I started a PR with these changes (#2298). Maybe some other places also need updating?

@amontoison
Copy link
Member

amontoison commented Mar 20, 2024

It's a bug in the NVIDIA routine so we should use it only as a workaround for now.
I will submit an issue to NVIDIA.

@maleadt maleadt linked a pull request Mar 21, 2024 that will close this issue
@maleadt maleadt closed this as completed Mar 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants