Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-threaded code hanging forever with Julia 1.10 #2261

Closed
AaronGhost opened this issue Feb 9, 2024 · 11 comments · Fixed by #2262
Closed

Multi-threaded code hanging forever with Julia 1.10 #2261

AaronGhost opened this issue Feb 9, 2024 · 11 comments · Fixed by #2262
Labels
bug Something isn't working

Comments

@AaronGhost
Copy link

Describe the bug

Thanks for your work on this library. Some of the code I wrote with multi-threading and CUDA hangs forever when using julia-1.10, it runs correctly with julia-1.9.4.

I manually reduced the code to the best of ability using differential testing while still triggering the bug.

To reproduce

The program hangs forever when using 4, 5, 6, 7 and 8 threads (my core count) with julia 1.10. The Minimal Working Example (MWE) for this bug is:

using CUDA

function main()
    data = rand(ComplexF32, (100, 100, 8, 20, 200))
    cu_result = CUDA.zeros(ComplexF32, (100, 100, 20, 200))

    Threads.@threads for i in axes(data, 5)
        for t in axes(data, 4)
            cu_result[:, :, t, i] .= sum(CuArray(data[:, :, :, t, i]))
        end
    end
end

println("Starting first iteration")
main()
println("First iteration finished")
main()
println("Second iteration finished")

The program finishes normally with julia1.9

Manifest.toml

[[deps.AbstractFFTs]]
deps = ["LinearAlgebra"]
git-tree-sha1 = "d92ad398961a3ed262d8bf04a1a2b8340f915fef"
uuid = "621f4979-c628-5d54-868e-fcf4e3e8185c"
version = "1.5.0"

    [deps.AbstractFFTs.extensions]
    AbstractFFTsChainRulesCoreExt = "ChainRulesCore"
    AbstractFFTsTestExt = "Test"

    [deps.AbstractFFTs.weakdeps]
    ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
    Test = "8dfed614-e22c-5e08-85e1-65c5234f0b40"

[[deps.Adapt]]
deps = ["LinearAlgebra", "Requires"]
git-tree-sha1 = "0fb305e0253fd4e833d486914367a2ee2c2e78d0"
uuid = "79e6a3ab-5dfb-504d-930d-738a2a938a0e"
version = "4.0.1"
weakdeps = ["StaticArrays"]

    [deps.Adapt.extensions]
    AdaptStaticArraysExt = "StaticArrays"

[[deps.CUDA]]
deps = ["AbstractFFTs", "Adapt", "BFloat16s", "CEnum", "CUDA_Driver_jll", "CUDA_Runtime_Discovery", "CUDA_Runtime_jll", "Crayons", "DataFrames", "ExprTools", "GPUArrays", "GPUCompiler", "KernelAbstractions", "LLVM", "LLVMLoopInfo", "LazyArtifacts", "Libdl", "LinearAlgebra", "Logging", "NVTX", "Preferences", "PrettyTables", "Printf", "Random", "Random123", "RandomNumbers", "Reexport", "Requires", "SparseArrays", "StaticArrays", "Statistics"]
git-tree-sha1 = "baa8ea7a1ea63316fa3feb454635215773c9c845"
uuid = "052768ef-5323-5732-b1bb-66c8b64840ba"
version = "5.2.0"

    [deps.CUDA.extensions]
    ChainRulesCoreExt = "ChainRulesCore"
    SpecialFunctionsExt = "SpecialFunctions"

    [deps.CUDA.weakdeps]
    ChainRulesCore = "d360d2e6-b24c-11e9-a2a3-2a2ae2dbcce4"
    SpecialFunctions = "276daf66-3868-5448-9aa4-cd146d93841b"

[[deps.CUDA_Driver_jll]]
deps = ["Artifacts", "JLLWrappers", "LazyArtifacts", "Libdl", "Pkg"]
git-tree-sha1 = "d01bfc999768f0a31ed36f5d22a76161fc63079c"
uuid = "4ee394cb-3365-5eb0-8335-949819d2adfc"
version = "0.7.0+1"

[[deps.CUDA_Runtime_Discovery]]
deps = ["Libdl"]
git-tree-sha1 = "2cb12f6b2209f40a4b8967697689a47c50485490"
uuid = "1af6417a-86b4-443c-805f-a4643ffb695f"
version = "0.2.3"

[[deps.CUDA_Runtime_jll]]
deps = ["Artifacts", "CUDA_Driver_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "TOML"]
git-tree-sha1 = "8e25c009d2bf16c2c31a70a6e9e8939f7325cc84"
uuid = "76a88914-d11a-5bdc-97e0-2f5a05c973a2"
version = "0.11.1+0"

[[deps.GPUArrays]]
deps = ["Adapt", "GPUArraysCore", "LLVM", "LinearAlgebra", "Printf", "Random", "Reexport", "Serialization", "Statistics"]
git-tree-sha1 = "47e4686ec18a9620850bad110b79966132f14283"
uuid = "0c68f7d7-f131-5f86-a1c3-88cf8149b2d7"
version = "10.0.2"

[[deps.GPUArraysCore]]
deps = ["Adapt"]
git-tree-sha1 = "ec632f177c0d990e64d955ccc1b8c04c485a0950"
uuid = "46192b85-c4d5-4398-a991-12ede77f4527"
version = "0.1.6"

[[deps.GPUCompiler]]
deps = ["ExprTools", "InteractiveUtils", "LLVM", "Libdl", "Logging", "Scratch", "TimerOutputs", "UUIDs"]
git-tree-sha1 = "a846f297ce9d09ccba02ead0cae70690e072a119"
uuid = "61eb1bfa-7361-4325-ad38-22787b887f55"
version = "0.25.0"

[[deps.JuliaNVTXCallbacks_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
git-tree-sha1 = "af433a10f3942e882d3c671aacb203e006a5808f"
uuid = "9c1d0b0a-7046-5b2e-a33f-ea22f176ac7e"
version = "0.2.1+0"

[[deps.KernelAbstractions]]
deps = ["Adapt", "Atomix", "InteractiveUtils", "LinearAlgebra", "MacroTools", "PrecompileTools", "Requires", "SparseArrays", "StaticArrays", "UUIDs", "UnsafeAtomics", "UnsafeAtomicsLLVM"]
git-tree-sha1 = "4e0cb2f5aad44dcfdc91088e85dee4ecb22c791c"
uuid = "63c18a36-062a-441e-b654-da1e3ab1ce7c"
version = "0.9.16"

    [deps.KernelAbstractions.extensions]
    EnzymeExt = "EnzymeCore"

    [deps.KernelAbstractions.weakdeps]
    EnzymeCore = "f151be2c-9106-41f4-ab19-57ee4f262869"

[[deps.LLVM]]
deps = ["CEnum", "LLVMExtra_jll", "Libdl", "Preferences", "Printf", "Requires", "Unicode"]
git-tree-sha1 = "cb4619f7353fc62a1a22ffa3d7ed9791cfb47ad8"
uuid = "929cbde3-209d-540e-8aea-75f648917ca0"
version = "6.4.2"
weakdeps = ["BFloat16s"]

    [deps.LLVM.extensions]
    BFloat16sExt = "BFloat16s"

[[deps.LLVMExtra_jll]]
deps = ["Artifacts", "JLLWrappers", "LazyArtifacts", "Libdl", "TOML"]
git-tree-sha1 = "98eaee04d96d973e79c25d49167668c5c8fb50e2"
uuid = "dad2f222-ce93-54a1-a47d-0025e8a3acab"
version = "0.0.27+1"

[[deps.LLVMLoopInfo]]
git-tree-sha1 = "2e5c102cfc41f48ae4740c7eca7743cc7e7b75ea"
uuid = "8b046642-f1f6-4319-8d3c-209ddc03c586"
version = "1.0.0"

[[deps.LLVMOpenMP_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl"]
git-tree-sha1 = "d986ce2d884d49126836ea94ed5bfb0f12679713"
uuid = "1d63c593-3942-5779-bab2-d838dc0a180e"
version = "15.0.7+0"

[[deps.NVTX]]
deps = ["Colors", "JuliaNVTXCallbacks_jll", "Libdl", "NVTX_jll"]
git-tree-sha1 = "53046f0483375e3ed78e49190f1154fa0a4083a1"
uuid = "5da4648a-3479-48b8-97b9-01cb529c0a1f"
version = "0.3.4"

[[deps.NVTX_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
git-tree-sha1 = "ce3269ed42816bf18d500c9f63418d4b0d9f5a3b"
uuid = "e98f9f5b-d649-5603-91fd-7774390e6439"
version = "3.1.0+2"

Expected behavior

I expect the program to finish (and cu_result to contain the correct result).

Version info

Details for Julia 1.10

Julia Version 1.10.0
Commit 3120989f39 (2023-12-25 18:01 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 16 × 11th Gen Intel(R) Core(TM) i9-11900K @ 3.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-15.0.7 (ORCJIT, rocketlake)
  Threads: 1 on 16 virtual cores

CUDA version with Julia 1.10

CUDA runtime 12.3, artifact installation
CUDA driver 12.3
NVIDIA driver 546.12.0

CUDA libraries:
- CUBLAS: 12.3.4
- CURAND: 10.3.4
- CUFFT: 11.0.12
- CUSOLVER: 11.5.4
- CUSPARSE: 12.2.0
- CUPTI: 21.0.0
- NVML: 12.0.0+546.12

Julia packages:
- CUDA: 5.2.0
- CUDA_Driver_jll: 0.7.0+1
- CUDA_Runtime_jll: 0.11.1+0

Toolchain:
- Julia: 1.10.0
- LLVM: 15.0.7

1 device:
  0: NVIDIA RTX A5000 (sm_86, 18.249 GiB / 23.988 GiB available)
Version details with Julia 1.9

Details of Julia 1.9

Julia Version 1.9.4
Commit 8e5136fa29 (2023-11-14 08:46 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 16 × 11th Gen Intel(R) Core(TM) i9-11900K @ 3.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, rocketlake)
  Threads: 12 on 16 virtual cores

Details on CUDA (Julia 1.9.4):

CUDA runtime 12.3, artifact installation
CUDA driver 12.3
NVIDIA driver 546.12.0

CUDA libraries:
- CUBLAS: 12.3.4
- CURAND: 10.3.4
- CUFFT: 11.0.12
- CUSOLVER: 11.5.4
- CUSPARSE: 12.2.0
- CUPTI: 21.0.0
- NVML: 12.0.0+546.12

Julia packages:
- CUDA: 5.2.0
- CUDA_Driver_jll: 0.7.0+1
- CUDA_Runtime_jll: 0.11.1+0

Toolchain:
- Julia: 1.9.4
- LLVM: 14.0.6

Preferences:
- CUDA_Runtime_jll.version: 12.3

1 device:
  0: NVIDIA RTX A5000 (sm_86, 20.694 GiB / 23.988 GiB available)

Additional context

Thanks very much for your help! Please let me know if I can help further with this!

@AaronGhost AaronGhost added the bug Something isn't working label Feb 9, 2024
@vchuravy
Copy link
Member

vchuravy commented Feb 9, 2024

Using https://docs.julialang.org/en/v1/stdlib/Profile/#Triggered-During-Execution

It looks like we get stuck on entering GC because cuOccupancyMaxPotentialBlockSize is blocking

unknown function (ip: 0x7ffae4162444)
__pthread_rwlock_wrlock at /usr/lib/libc.so.6 (unknown line)
unknown function (ip: 0x7ffa52104347)
unknown function (ip: 0x7ffa51f14b34)
macro expansion at /home/vchuravy/.julia/packages/CUDA/6Jmwc/lib/cudadrv/libcuda.jl:4848 [inlined]
#705 at /home/vchuravy/.julia/packages/CUDA/6Jmwc/lib/utils/call.jl:27
check at /home/vchuravy/.julia/packages/CUDA/6Jmwc/lib/cudadrv/libcuda.jl:32 [inlined]
cuOccupancyMaxPotentialBlockSize at /home/vchuravy/.julia/packages/CUDA/6Jmwc/lib/utils/call.jl:26 [inlined]
#launch_configuration#901 at /home/vchuravy/.julia/packages/CUDA/6Jmwc/lib/cudadrv/occupancy.jl:59 [inlined]
launch_configuration at /home/vchuravy/.julia/packages/CUDA/6Jmwc/lib/cudadrv/occupancy.jl:54 [inlined]
#launch_heuristic#1126 at /home/vchuravy/.julia/packages/CUDA/6Jmwc/src/gpuarrays.jl:22 [inlined]
launch_heuristic at /home/vchuravy/.julia/packages/CUDA/6Jmwc/src/gpuarrays.jl:15 [inlined]
_copyto! at /home/vchuravy/.julia/packages/GPUArrays/Hd5Sk/src/host/broadcast.jl:56 [inlined]
materialize! at /home/vchuravy/.julia/packages/GPUArrays/Hd5Sk/src/host/broadcast.jl:32 [inlined]
materialize! at ./broadcast.jl:911 [inlined]
macro expansion at ./REPL[4]:7 [inlined]
#2#threadsfor_fun#1 at ./threadingconstructs.jl:214
#2#threadsfor_fun at ./threadingconstructs.jl:181 [inlined]
#1 at ./threadingconstructs.jl:153
unknown function (ip: 0x7ffaccf94692)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238


unknown function (ip: (nil))
unknown function (ip: 0x7ffae41624ac)
pthread_cond_wait at /usr/lib/libc.so.6 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173
jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350
unknown function (ip: 0x7ffae411770f)
toInt64 at ./boot.jl:703 [inlined]
Int64 at ./boot.jl:784 [inlined]
convert at ./number.jl:7 [inlined]
_promote at ./promotion.jl:370 [inlined]
promote at ./promotion.jl:393 [inlined]
< at ./promotion.jl:462 [inlined]
> at ./operators.jl:378 [inlined]
compute_threads at /home/vchuravy/.julia/packages/CUDA/6Jmwc/src/mapreduce.jl:222 [inlined]
call_composed at ./operators.jl:1045 [inlined]
call_composed at ./operators.jl:1044 [inlined]
#_#103 at ./operators.jl:1041 [inlined]
ComposedFunction at ./operators.jl:1041 [inlined]
#902 at /home/vchuravy/.julia/packages/CUDA/6Jmwc/lib/cudadrv/occupancy.jl:61
unknown function (ip: (nil))

@vchuravy
Copy link
Member

vchuravy commented Feb 9, 2024

Could you test #2262 and see if it fixes your issue?

@AaronGhost
Copy link
Author

Thanks for looking into it! I checked out the branch locally, deved and it still deadlocks on Windows. I am happy to run some diagnostics to track it further, but not really sure what commands I need to run (The profiler can't be triggered during execution on Windows if I understood correctly and the @profile never returns due to the deadlock, unless I am missing something?).

@vchuravy
Copy link
Member

Yeah windows makes that harder, if you can somehow get a backtrack for all threads that would help immensely.

I could reproduce the hang on Linux before, but can't anymore. Maybe you could try WSL?

@AaronGhost
Copy link
Author

AaronGhost commented Feb 12, 2024

I managed to reproduce a deadlock with WSL. I ran the program with 4 threads this time. The first iteration of main completes but the deadlock happens on the second iteration. I then used the signal method to get the backtrace. The backtrace is below. Let me know if I can do anything else to help!

Backtrace

signal (10): User defined signal 1
pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173
jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350
_IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
ijl_process_events at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/jl_uv.c:277
ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:524
poptask at ./task.jl:985
wait at ./task.jl:994
#wait#645 at ./condition.jl:130
wait at ./condition.jl:125 [inlined]
take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53
synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120
unknown function (ip: 0x7f51ba9eafd8)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076
jlcapi_synchronization_worker_13802 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line)
start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: (nil))
pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:509
poptask at ./task.jl:985
wait at ./task.jl:994
#wait#645 at ./condition.jl:130
wait at ./condition.jl:125 [inlined]
take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53
synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120
unknown function (ip: 0x7f51ba9eafd8)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076
jlcapi_synchronization_worker_13802 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line)
start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: (nil))
pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173
jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350
_IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
_jl_mutex_unlock at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/threading.c:927
jl_mutex_unlock at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_locks.h:80 [inlined]
ijl_process_events at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/jl_uv.c:286
ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:524
poptask at ./task.jl:985
wait at ./task.jl:994
#wait#645 at ./condition.jl:130
wait at ./condition.jl:125 [inlined]
take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53
synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120
unknown function (ip: 0x7f51ba9eafd8)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076
jlcapi_synchronization_worker_13802 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line)
start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: (nil))
pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:509
poptask at ./task.jl:985
wait at ./task.jl:994
#wait#645 at ./condition.jl:130
wait at ./condition.jl:125 [inlined]
take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53
synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120
unknown function (ip: 0x7f51ba9eafd8)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076
jlcapi_synchronization_worker_13802 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line)
start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: (nil))
_mm_pause at /usr/local/lib/gcc/x86_64-linux-gnu/9.1.0/include/xmmintrin.h:1271 [inlined]
jl_gc_wait_for_the_world at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:242 [inlined]
ijl_gc_collect at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:3502
maybe_collect at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:937 [inlined]
jl_gc_pool_alloc_inner at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:1293 [inlined]
jl_gc_pool_alloc_noinline at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:1350
jl_gc_alloc_ at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:477 [inlined]
_new_array_ at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/array.c:144 [inlined]
_new_array at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/array.c:198 [inlined]
ijl_alloc_array_3d at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/array.c:450
Array at ./boot.jl:481 [inlined]
Array at ./boot.jl:488 [inlined]
similar at ./array.jl:420 [inlined]
similar at ./abstractarray.jl:828 [inlined]
_unsafe_getindex at ./multidimensional.jl:901
_getindex at ./multidimensional.jl:889 [inlined]
getindex at ./abstractarray.jl:1288 [inlined]
macro expansion at /mnt/d/Documents/Julia/deadlock.jl:9 [inlined]
#2#threadsfor_fun#1 at ./threadingconstructs.jl:214
#2#threadsfor_fun at ./threadingconstructs.jl:181 [inlined]
#1 at ./threadingconstructs.jl:153
unknown function (ip: 0x7f51ba9d6bc2)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238
unknown function (ip: (nil))
pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173
jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350
_IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
task_local_state! at /path/to/local/CUDA.jl/lib/cudadrv/state.jl:69
unknown function (ip: (nil))
pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173
jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350
_IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
unknown function (ip: 0x7f51ba9ca1fc)
unknown function (ip: (nil))
pthread_rwlock_wrlock at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
unknown function (ip: 0x7f515f244454)
unknown function (ip: 0x7f515ef80536)
unknown function (ip: 0x7f515ef81233)
unknown function (ip: 0x7f515ef82eae)
unknown function (ip: 0x7f515f067f34)
macro expansion at /path/to/local/CUDA.jl/lib/cudadrv/libcuda.jl:356 [inlined]
#49 at /path/to/local/CUDA.jl/lib/utils/call.jl:27 [inlined]
check at /path/to/local/CUDA.jl/lib/cudadrv/libcuda.jl:32 [inlined]
cuMemcpyDtoHAsync_v2 at /path/to/local/CUDA.jl/lib/utils/call.jl:26 [inlined]
#unsafe_copyto!#8 at /path/to/local/CUDA.jl/lib/cudadrv/memory.jl:397 [inlined]
unsafe_copyto! at /path/to/local/CUDA.jl/lib/cudadrv/memory.jl:394
#1055 at /path/to/local/CUDA.jlsrc/array.jl:610
#context!#913 at /path/to/local/CUDA.jl/lib/cudadrv/state.jl:170 [inlined]
context! at /path/to/local/CUDA.jl/lib/cudadrv/state.jl:165 [inlined]
unsafe_copyto! at /path/to/local/CUDA.jlsrc/array.jl:602
copyto! at /path/to/local/CUDA.jlsrc/array.jl:555 [inlined]
getindex at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/indexing.jl:50
scalar_getindex at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/indexing.jl:34 [inlined]
_getindex at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/indexing.jl:17 [inlined]
getindex at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/indexing.jl:15 [inlined]
macro expansion at /home/username/.julia/packages/GPUArraysCore/GMsgk/src/GPUArraysCore.jl:210 [inlined]
#_mapreduce#43 at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:71
_mapreduce at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:33 [inlined]
#mapreduce#41 at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:28 [inlined]
mapreduce at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:28 [inlined]
#_sum#831 at ./reducedim.jl:1015 [inlined]
_sum at ./reducedim.jl:1015 [inlined]
#_sum#830 at ./reducedim.jl:1014 [inlined]
_sum at ./reducedim.jl:1014 [inlined]
#sum#828 at ./reducedim.jl:1010 [inlined]
sum at ./reducedim.jl:1010 [inlined]
macro expansion at /mnt/d/Documents/Julia/deadlock.jl:9 [inlined]
#2#threadsfor_fun#1 at ./threadingconstructs.jl:214
#2#threadsfor_fun at ./threadingconstructs.jl:181 [inlined]
#1 at ./threadingconstructs.jl:153
unknown function (ip: 0x7f51ba9d6bc2)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238
unknown function (ip: (nil))          

Collected profile

Overhead ╎ [+additional indent] Count File:Line; Function
=========================================================
Thread 1 Task 0x00007f50e87044c0 Total snapshots: 385. Utilization: 100%
   ╎385 @Base/threadingconstructs.jl:153; (::Base.Threads.var"#1#2"{var"#2#threadsfor_fun#2"{var"#2#threadsfor_fun#1#3"{CuArray{ComplexF32, 4, CUDA.Mem.DeviceBuffer}, Array{ComplexF32, 5}, Base.OneTo{Int64}}}, Int64})()
   ╎ 385 @Base/threadingconstructs.jl:181; #2#threadsfor_fun
   ╎  385 @Base/threadingconstructs.jl:214; (::var"#2#threadsfor_fun#2"{var"#2#threadsfor_fun#1#3"{CuArray{ComplexF32, 4, CUDA.Mem.DeviceBuffer}, Array{ComplexF32, 5}, Base.OneTo{Int64}}})(tid::Int64; onethread::Bool)
   ╎   385 /mnt/d/Documents/Julia/deadlock.jl:9; macro expansion
   ╎    385 @Base/reducedim.jl:1010; sum
   ╎     385 @Base/reducedim.jl:1010; #sum#828
   ╎    ╎ 385 @Base/reducedim.jl:1014; _sum
   ╎    ╎  385 @Base/reducedim.jl:1014; #_sum#830
   ╎    ╎   385 @Base/reducedim.jl:1015; _sum
   ╎    ╎    385 @Base/reducedim.jl:1015; #_sum#831
   ╎    ╎     385 @GPUArrays/src/host/mapreduce.jl:28; mapreduce
   ╎    ╎    ╎ 385 @GPUArrays/src/host/mapreduce.jl:28; #mapreduce#41
   ╎    ╎    ╎  385 @GPUArrays/src/host/mapreduce.jl:33; _mapreduce
   ╎    ╎    ╎   385 @GPUArrays/src/host/mapreduce.jl:71; _mapreduce(f::typeof(identity), op::typeof(Base.add_sum), As::CuArray{ComplexF32, 3, CUDA.Mem.DeviceBuffer}; dims::Colon, init::Nothing)
   ╎    ╎    ╎    385 @GPUArraysCore/src/GPUArraysCore.jl:210; macro expansion
   ╎    ╎    ╎     385 @GPUArrays/src/host/indexing.jl:15; getindex(A::CuArray{ComplexF32, 3, CUDA.Mem.DeviceBuffer}, I::Int64)
   ╎    ╎    ╎    ╎ 385 @GPUArrays/src/host/indexing.jl:17; _getindex
   ╎    ╎    ╎    ╎  385 @GPUArrays/src/host/indexing.jl:34; scalar_getindex
   ╎    ╎    ╎    ╎   385 @GPUArrays/src/host/indexing.jl:50; getindex(A::CuArray{ComplexF32, 3, CUDA.Mem.DeviceBuffer}, I::Int64)
   ╎    ╎    ╎    ╎    385 @CUDA/src/array.jl:555; copyto!
   ╎    ╎    ╎    ╎     385 @CUDA/src/array.jl:602; unsafe_copyto!(dest::Vector{ComplexF32}, doffs::Int64, src::CuArray{ComplexF32, 3, CUDA.Mem.DeviceBuffer}, soffs::Int64, n::Int64)
   ╎    ╎    ╎    ╎    ╎ 385 @CUDA/lib/cudadrv/state.jl:165; context!(ctx::CuContext)
   ╎    ╎    ╎    ╎    ╎  385 @CUDA/lib/cudadrv/state.jl:170; #context!#913
   ╎    ╎    ╎    ╎    ╎   385 @CUDA/src/array.jl:610; (::CUDA.var"#1055#1056"{ComplexF32, Vector{ComplexF32}, Int64, CuArray{ComplexF32, 3, CUDA.Mem.DeviceBuffer}, Int64, Int64})()
   ╎    ╎    ╎    ╎    ╎    385 @CUDA/lib/cudadrv/memory.jl:394; kwcall(::@NamedTuple{async::Bool}, ::typeof(unsafe_copyto!), dst::Ptr{ComplexF32}, src::CuPtr{ComplexF32}, N::Int64)
   ╎    ╎    ╎    ╎    ╎     385 @CUDA/lib/cudadrv/memory.jl:397; #unsafe_copyto!#8
   ╎    ╎    ╎    ╎    ╎    ╎ 385 @CUDA/lib/utils/call.jl:26; cuMemcpyDtoHAsync_v2
   ╎    ╎    ╎    ╎    ╎    ╎  385 @CUDA/lib/cudadrv/libcuda.jl:32; check
   ╎    ╎    ╎    ╎    ╎    ╎   385 @CUDA/lib/utils/call.jl:27; #49
384╎    ╎    ╎    ╎    ╎    ╎    385 @CUDA/lib/cudadrv/libcuda.jl:356; macro expansion

Thread 2 Task 0x00007f50e87041a0 Total snapshots: 385. Utilization: 100%

Thread 3 Task 0x00007f50e8704330 Total snapshots: 385. Utilization: 100%
384╎385 @CUDA/lib/cudadrv/state.jl:69; task_local_state!()

Thread 4 Task 0x00007f50e8704650 Total snapshots: 385. Utilization: 100%
   ╎385 @Base/threadingconstructs.jl:153; (::Base.Threads.var"#1#2"{var"#2#threadsfor_fun#2"{var"#2#threadsfor_fun#1#3"{CuArray{ComplexF32, 4, CUDA.Mem.DeviceBuffer}, Array{ComplexF32, 5}, Base.OneTo{Int64}}}, Int64})()
   ╎ 385 @Base/threadingconstructs.jl:181; #2#threadsfor_fun
   ╎  385 @Base/threadingconstructs.jl:214; (::var"#2#threadsfor_fun#2"{var"#2#threadsfor_fun#1#3"{CuArray{ComplexF32, 4, CUDA.Mem.DeviceBuffer}, Array{ComplexF32, 5}, Base.OneTo{Int64}}})(tid::Int64; onethread::Bool)
   ╎   385 /mnt/d/Documents/Julia/deadlock.jl:9; macro expansion
   ╎    385 @Base/abstractarray.jl:1288; getindex
   ╎     385 @Base/multidimensional.jl:889; _getindex
   ╎    ╎ 385 @Base/multidimensional.jl:901; _unsafe_getindex(::IndexLinear, ::Array{ComplexF32, 5}, ::Base.Slice{Base.OneTo{Int64}}, ::Base.Slice{Base.OneTo{Int64}}, ::Base.Slice{Base.OneTo{Int64}}, ::Int64, ::Int64)
   ╎    ╎  385 @Base/abstractarray.jl:828; similar
   ╎    ╎   385 @Base/array.jl:420; similar
   ╎    ╎    385 @Base/boot.jl:488; Array
384╎    ╎     385 @Base/boot.jl:481; Array

Thread 6 Task 0x00007f50c703c010 Total snapshots: 385. Utilization: 0%
   ╎385 @CUDA/lib/cudadrv/synchronization.jl:120; synchronization_worker(data::Ptr{Nothing})
   ╎ 385 @CUDA/lib/cudadrv/synchronization.jl:53; take!(f::CUDA.var"#921#926", c::CUDA.BidirectionalChannel{Union{CuContext, CuEvent, CuStream}, CUDA.cudaError_enum})
   ╎  385 @Base/condition.jl:125; wait
   ╎   385 @Base/condition.jl:130; wait(c::Base.GenericCondition{ReentrantLock}; first::Bool)
   ╎    385 @Base/task.jl:994; wait()
384╎     385 @Base/task.jl:985; poptask(W::Base.IntrusiveLinkedListSynchronized{Task})

Thread 7 Task 0x00007f50c7004010 Total snapshots: 385. Utilization: 100%
   ╎385 @CUDA/lib/cudadrv/synchronization.jl:120; synchronization_worker(data::Ptr{Nothing})
   ╎ 385 @CUDA/lib/cudadrv/synchronization.jl:53; take!(f::CUDA.var"#921#926", c::CUDA.BidirectionalChannel{Union{CuContext, CuEvent, CuStream}, CUDA.cudaError_enum})
   ╎  385 @Base/condition.jl:125; wait
   ╎   385 @Base/condition.jl:130; wait(c::Base.GenericCondition{ReentrantLock}; first::Bool)
   ╎    385 @Base/task.jl:994; wait()
384╎     385 @Base/task.jl:985; poptask(W::Base.IntrusiveLinkedListSynchronized{Task})

Thread 8 Task 0x00007f50c7000010 Total snapshots: 385. Utilization: 0%
   ╎385 @CUDA/lib/cudadrv/synchronization.jl:120; synchronization_worker(data::Ptr{Nothing})
   ╎ 385 @CUDA/lib/cudadrv/synchronization.jl:53; take!(f::CUDA.var"#921#926", c::CUDA.BidirectionalChannel{Union{CuContext, CuEvent, CuStream}, CUDA.cudaError_enum})
   ╎  385 @Base/condition.jl:125; wait
   ╎   385 @Base/condition.jl:130; wait(c::Base.GenericCondition{ReentrantLock}; first::Bool)
   ╎    385 @Base/task.jl:994; wait()
384╎     385 @Base/task.jl:985; poptask(W::Base.IntrusiveLinkedListSynchronized{Task})

Thread 9 Task 0x00007f50c6ff8010 Total snapshots: 385. Utilization: 100%
   ╎385 @CUDA/lib/cudadrv/synchronization.jl:120; synchronization_worker(data::Ptr{Nothing})
   ╎ 385 @CUDA/lib/cudadrv/synchronization.jl:53; take!(f::CUDA.var"#921#926", c::CUDA.BidirectionalChannel{Union{CuContext, CuEvent, CuStream}, CUDA.cudaError_enum})
   ╎  385 @Base/condition.jl:125; wait
   ╎   385 @Base/condition.jl:130; wait(c::Base.GenericCondition{ReentrantLock}; first::Bool)
   ╎    385 @Base/task.jl:994; wait()
384╎     385 @Base/task.jl:985; poptask(W::Base.IntrusiveLinkedListSynchronized{Task})

@vchuravy
Copy link
Member

Hm but you were able to collect a profile. That means it didn't fully hang at that point.
In the other case I never got a collected profile since we never hit a yield point.

@AaronGhost
Copy link
Author

AaronGhost commented Feb 12, 2024

I tried the experience multiple times:

  • In Windows, I can't seem to interrupt the computation.
  • In WSL, I managed to interrupt the computation at some point with ctrl + C / SIGINT. In most case I get:
==============================================================
Profile collected. A report will print at the next yield point
==============================================================

^C^C^C^C^C^CWARNING: Force throwing a SIGINT
Segmentation fault

And I never get access to the profile. I added below one of the reports I get in this case:

Backtrace

signal (10): User defined signal 1
pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:509
poptask at ./task.jl:985
wait at ./task.jl:994
#wait#645 at ./condition.jl:130
wait at ./condition.jl:125 [inlined]
take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53
synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120
unknown function (ip: 0x7f849a3a0808)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076
jlcapi_synchronization_worker_13802 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line)
start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: (nil))
pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:509
poptask at ./task.jl:985
wait at ./task.jl:994
#wait#645 at ./condition.jl:130
wait at ./condition.jl:125 [inlined]
take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53
synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120
unknown function (ip: 0x7f849a3a0808)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076
jlcapi_synchronization_worker_13802 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line)
start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: (nil))
pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:509
poptask at ./task.jl:985
wait at ./task.jl:994
#wait#645 at ./condition.jl:130
wait at ./condition.jl:125 [inlined]
take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53
synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120
unknown function (ip: 0x7f849a3a0808)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076
jlcapi_synchronization_worker_13802 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line)
start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: (nil))
pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173
jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350
_IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
ijl_gc_safepoint at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/jlapi.c:472
ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:472
poptask at ./task.jl:985
wait at ./task.jl:994
#wait#645 at ./condition.jl:130
wait at ./condition.jl:125 [inlined]
take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53
synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120
unknown function (ip: 0x7f849a3a0808)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076
jlcapi_synchronization_worker_13802 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line)
start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: (nil))
pthread_rwlock_wrlock at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
unknown function (ip: 0x7f844007231b)
unknown function (ip: 0x7f843fd7da08)
unknown function (ip: 0x7f843fd7e5eb)
unknown function (ip: 0x7f843fe91946)
macro expansion at /path/to/local/CUDA.jl/lib/cudadrv/libcuda.jl:4165 [inlined]
#507 at /path/to/local/CUDA.jl/lib/utils/call.jl:27 [inlined]
check at /path/to/local/CUDA.jl/lib/cudadrv/libcuda.jl:32 [inlined]
cuLaunchKernel at /path/to/local/CUDA.jl/lib/utils/call.jl:26 [inlined]
#888 at /path/to/local/CUDA.jl/lib/cudadrv/execution.jl:66
macro expansion at /path/to/local/CUDA.jl/lib/cudadrv/execution.jl:33 [inlined]
macro expansion at ./none:0 [inlined]
pack_arguments at ./none:0
#launch#887 at /path/to/local/CUDA.jl/lib/cudadrv/execution.jl:59
launch at /path/to/local/CUDA.jl/lib/cudadrv/execution.jl:52 [inlined]
#894 at /path/to/local/CUDA.jl/lib/cudadrv/execution.jl:175 [inlined]
macro expansion at /path/to/local/CUDA.jl/lib/cudadrv/execution.jl:135 [inlined]
macro expansion at ./none:0 [inlined]
convert_arguments at ./none:0 [inlined]
#cudacall#893 at /path/to/local/CUDA.jl/lib/cudadrv/execution.jl:177 [inlined]
cudacall at /path/to/local/CUDA.jl/lib/cudadrv/execution.jl:173 [inlined]
macro expansion at /path/to/local/CUDA.jl/src/compiler/execution.jl:266 [inlined]
macro expansion at ./none:0 [inlined]
#call#1085 at ./none:0
unknown function (ip: 0x7f849a39e685)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076
call at ./none:0 [inlined]
#_#1100 at /path/to/local/CUDA.jl/src/compiler/execution.jl:389
HostKernel at /path/to/local/CUDA.jl/src/compiler/execution.jl:388 [inlined]
macro expansion at /path/to/local/CUDA.jl/src/compiler/execution.jl:114 [inlined]
#mapreducedim!#1161 at /path/to/local/CUDA.jl/src/mapreduce.jl:271
mapreducedim! at /path/to/local/CUDA.jl/src/mapreduce.jl:169 [inlined]
#_mapreduce#43 at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:67
_mapreduce at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:33 [inlined]
#mapreduce#41 at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:28 [inlined]
mapreduce at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:28 [inlined]
#_sum#831 at ./reducedim.jl:1015 [inlined]
_sum at ./reducedim.jl:1015 [inlined]
#_sum#830 at ./reducedim.jl:1014 [inlined]
_sum at ./reducedim.jl:1014 [inlined]
#sum#828 at ./reducedim.jl:1010 [inlined]
sum at ./reducedim.jl:1010 [inlined]
macro expansion at /mnt/d/Documents/Julia/deadlock.jl:9 [inlined]
#54#threadsfor_fun#10 at ./threadingconstructs.jl:214
#54#threadsfor_fun at ./threadingconstructs.jl:181 [inlined]
#1 at ./threadingconstructs.jl:153
unknown function (ip: 0x7f849a3bef52)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238
unknown function (ip: (nil))
_mm_pause at /usr/local/lib/gcc/x86_64-linux-gnu/9.1.0/include/xmmintrin.h:1271 [inlined]
jl_gc_wait_for_the_world at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:242 [inlined]
ijl_gc_collect at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:3502
maybe_collect at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:937 [inlined]
jl_gc_pool_alloc_inner at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:1293 [inlined]
jl_gc_pool_alloc_noinline at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:1350
jl_gc_alloc_ at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:477 [inlined]
_new_array_ at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/array.c:144 [inlined]
_new_array at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/array.c:198 [inlined]
ijl_alloc_array_3d at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/array.c:450
Array at ./boot.jl:481 [inlined]
Array at ./boot.jl:488 [inlined]
similar at ./array.jl:420 [inlined]
similar at ./abstractarray.jl:828 [inlined]
_unsafe_getindex at ./multidimensional.jl:901
_getindex at ./multidimensional.jl:889 [inlined]
getindex at ./abstractarray.jl:1288 [inlined]
macro expansion at /mnt/d/Documents/Julia/deadlock.jl:9 [inlined]
#54#threadsfor_fun#10 at ./threadingconstructs.jl:214
#54#threadsfor_fun at ./threadingconstructs.jl:181 [inlined]
#1 at ./threadingconstructs.jl:153
unknown function (ip: 0x7f849a3bef52)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238
unknown function (ip: (nil))
pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173
jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350
_IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
unknown function (ip: 0x7f849a38798c)
unknown function (ip: (nil))
pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173
jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350
_IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
copy at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/abstractarray.jl:75
unknown function (ip: (nil))

==============================================================
Profile collected. A report will print at the next yield point
==============================================================

In a few cases with WSL, I manage to get a profile out by continuing to send interruption signals. I assume I manage to interrupt a function in particular but really not sure what is going on here, I get something out once in 10 tries I would say.

==============================================================
Profile collected. A report will print at the next yield point
==============================================================

^C^C^C^C^C^C^C^CWARNING: Force throwing a SIGINT
ERROR: LoadError: InterruptException:
Stacktrace:
  [1] try_yieldto(undo::typeof(Base.ensure_rescheduled))
    @ Base ./task.jl:931
  [2] wait()
    @ Base ./task.jl:995
  [3] wait(c::Base.GenericCondition{Base.Threads.SpinLock}; first::Bool)
    @ Base ./condition.jl:130
  [4] wait
    @ Base ./condition.jl:125 [inlined]
  [5] _wait(t::Task)
    @ Base ./task.jl:310
  [6] ^Cthreading_run(fun::var"#39#threadsfor_fun#8"{var"#39#threadsfor_fun#7#9"{CuArray{ComplexF32, 4, CUDA.Mem.DeviceBuffer}, Array{ComplexF32, 5}, Base.OneTo{Int64}}}, static::Bool)
    @ Base.Threads ./threadingconstructs.jl:166
  [7] macro expansion
    @ ./threadingconstructs.jl:219 [inlined]
  [8] main()
    @ Main /path/to/test_deadlock.jl:7
  [9] top-level scope
    @ /path/to/test_deadlock.jl:15
 [10] include(fname::String)
    @ Base.MainInclude ./client.jl:489
 [11] top-level scope
    @ REPL[2]:1
 [12] top-level scope
    @ /path/to/local/CUDA.jl/src/initialization.jl:206
in expression starting at /path/to/test_deadlock.jl:15

@maleadt
Copy link
Member

maleadt commented Feb 12, 2024

Can you try the latest version of the PR (which marks all ccalls as gc-safe)?

@AaronGhost
Copy link
Author

Thanks. I tried the latest version of the PR and can't make my MWE deadlock on WSL or Windows with julia 1.10 anymore. I tried the latest version to my original code which still deadlocks with 1.10 and finishes normally with 1.9.

I reduced the new version which is very similar to the previous one except for the FFT plan. I added the backtrace obtained from WSL with 8 threads and the new MWE:

using CUDA
using ChunkSplitters

function main()
    data = rand(ComplexF32, (100, 100, 8, 20, 200))
    cu_result = CUDA.zeros(ComplexF32, (100, 100, 20, 200))
    plans = [CUDA.CUFFT.plan_bfft(CUDA.zeros(ComplexF32, (100, 100, 8)), 1:2) for _ in 1:Threads.nthreads()]

    Threads.@threads for (ichunk, chunk) in enumerate(chunks(axes(data, 5); n=Threads.nthreads()))
        for i in chunk
            for t in axes(data, 4)
                cu_result[:, :, t, i] .= sum(plans[ichunk] * CuArray(data[:, :, :, t, i]))
            end
        end
    end
end

println(getpid())
for i in 1:5
    println("Run $i")
    main()
end
Backtrace

======================================================================================
Information request received. A stacktrace will print followed by a 1.0 second profile
======================================================================================

cmd: /home/username/julia-1.10.0/bin/julia 776 running 15 of 15

signal (10): User defined signal 1
pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173
jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350
_IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
multiq_check_empty at ./partr.jl:186
jfptr_multiq_check_empty_75167.1 at /home/username/julia-1.10.0/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 
check_empty at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:340 [inlined]
ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:388
poptask at ./task.jl:985
wait at ./task.jl:994
#wait#645 at ./condition.jl:130
wait at ./condition.jl:125 [inlined]
take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53
synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120       
unknown function (ip: 0x7fcb27c66c08)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 
jlcapi_synchronization_worker_13946 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line)
start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: (nil))
pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173
jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350
_IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
_jl_mutex_unlock at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/threading.c:927
jl_mutex_unlock at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_locks.h:80 [inlined]
ijl_process_events at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/jl_uv.c:286
ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:524
poptask at ./task.jl:985
wait at ./task.jl:994
#wait#645 at ./condition.jl:130
wait at ./condition.jl:125 [inlined]
take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53
synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120       
unknown function (ip: 0x7fcb27c66c08)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 
jlcapi_synchronization_worker_13946 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line)
start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: (nil))
pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173
jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350
_IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
jl_gc_state_set at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_threads.h:351 [inlined]
jl_gc_state_set at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_threads.h:344 [inlined]
jl_gc_safe_leave at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/jlapi.c:465macro expansion at /path/to/local/CUDA.jl/lib/utils/call.jl:204 [inlined]
unchecked_cuStreamSynchronize at /path/to/local/CUDA.jl/lib/cudadrv/libcuda.jl:4023 [inlined]
#920 at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:126 [inlined]
take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:56
synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120       
unknown function (ip: 0x7fcb27c66c08)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 
jlcapi_synchronization_worker_13946 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line)
start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: (nil))
pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173
jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350
_IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
ijl_process_events at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/jl_uv.c:277
ijl_task_get_next at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/partr.c:524
poptask at ./task.jl:985
wait at ./task.jl:994
#wait#645 at ./condition.jl:130
wait at ./condition.jl:125 [inlined]
take! at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:53
synchronization_worker at /path/to/local/CUDA.jl/lib/cudadrv/synchronization.jl:120       
unknown function (ip: 0x7fcb27c66c08)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 
jlcapi_synchronization_worker_13946 at /home/username/.julia/compiled/v1.10/CUDA/oWw5k_lOIDm.so (unknown line)
start_thread at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
clone at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
unknown function (ip: (nil))
pthread_rwlock_wrlock at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
unknown function (ip: 0x7fcaee2f131b)
unknown function (ip: 0x7fcaedfe2021)
unknown function (ip: 0x7fcaee0e4546)
macro expansion at /path/to/local/CUDA.jl/lib/utils/call.jl:203 [inlined]
macro expansion at /path/to/local/CUDA.jl/lib/cudadrv/libcuda.jl:4848 [inlined]
#705 at /path/to/local/CUDA.jl/lib/utils/call.jl:30
check at /path/to/local/CUDA.jl/lib/cudadrv/libcuda.jl:32 [inlined]
cuOccupancyMaxPotentialBlockSize at /path/to/local/CUDA.jl/lib/utils/call.jl:29
unknown function (ip: 0x7fcb27ca9802)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 
#launch_configuration#901 at /path/to/local/CUDA.jl/lib/cudadrv/occupancy.jl:75
launch_configuration at /path/to/local/CUDA.jl/lib/cudadrv/occupancy.jl:60 [inlined]      
#mapreducedim!#1160 at /path/to/local/CUDA.jl/src/mapreduce.jl:236
mapreducedim! at /path/to/local/CUDA.jl/src/mapreduce.jl:169 [inlined]
#mapreducedim!#1160 at /path/to/local/CUDA.jl/src/mapreduce.jl:274
mapreducedim! at /path/to/local/CUDA.jl/src/mapreduce.jl:169 [inlined]
#_mapreduce#43 at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:67
_mapreduce at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:33 [inlined]   
#mapreduce#41 at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:28 [inlined]mapreduce at /home/username/.julia/packages/GPUArrays/Hd5Sk/src/host/mapreduce.jl:28 [inlined]    
#_sum#831 at ./reducedim.jl:1015 [inlined]
_sum at ./reducedim.jl:1015 [inlined]
#_sum#830 at ./reducedim.jl:1014 [inlined]
_sum at ./reducedim.jl:1014 [inlined]
#sum#828 at ./reducedim.jl:1010 [inlined]
sum at ./reducedim.jl:1010 [inlined]
macro expansion at /local/directory/deadlock.jl:12 [inlined]
#2#threadsfor_fun#2 at ./threadingconstructs.jl:214
#2#threadsfor_fun at ./threadingconstructs.jl:181 [inlined]
#1 at ./threadingconstructs.jl:153
unknown function (ip: 0x7fcb27c84562)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 
jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238      
unknown function (ip: (nil))
pthread_rwlock_wrlock at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
unknown function (ip: 0x7fcaee2f131b)
unknown function (ip: 0x7fcaedffca08)
unknown function (ip: 0x7fcaedffd5eb)
unknown function (ip: 0x7fcaee110946)
unknown function (ip: 0x7fca7440b04c)
unknown function (ip: 0x7fca744302de)
unknown function (ip: 0x7fca7443a879)
unknown function (ip: 0x7fca744272d9)
unknown function (ip: 0x7fca7442b6bc)
unknown function (ip: 0x7fca74455bda)
unknown function (ip: 0x7fca74408fe9)
cufftXtExecDescriptor at /home/username/.julia/artifacts/c0e6b8fff2621303ace1cc360b7fca676b4e28fd/lib/libcufft.so (unknown line)
cufftXtExec at /home/username/.julia/artifacts/c0e6b8fff2621303ace1cc360b7fca676b4e28fd/lib/libcufft.so (unknown line)
macro expansion at /path/to/local/CUDA.jl/lib/cufft/libcufft.jl:229 [inlined]
#46 at /path/to/local/CUDA.jl/lib/utils/call.jl:30 [inlined]
retry_reclaim at /path/to/local/CUDA.jl/src/pool.jl:370 [inlined]
check at /path/to/local/CUDA.jl/lib/cufft/libcufft.jl:18 [inlined]
cufftExecC2C at /path/to/local/CUDA.jl/lib/utils/call.jl:29
unsafe_execute! at /path/to/local/CUDA.jl/lib/cufft/fft.jl:332
unsafe_execute_trailing! at /path/to/local/CUDA.jl/lib/cufft/fft.jl:401
* at /path/to/local/CUDA.jl/lib/cufft/fft.jl:455 [inlined]
macro expansion at /local/directory/deadlock.jl:12 [inlined]
#2#threadsfor_fun#2 at ./threadingconstructs.jl:214
#2#threadsfor_fun at ./threadingconstructs.jl:181 [inlined]
#1 at ./threadingconstructs.jl:153
unknown function (ip: 0x7fcb27c84562)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 
jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238      
unknown function (ip: (nil))
pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173
jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350
_IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
#189 at /path/to/local/CUDA.jl/lib/utils/call.jl:30
unknown function (ip: (nil))
_mm_pause at /usr/local/lib/gcc/x86_64-linux-gnu/9.1.0/include/xmmintrin.h:1271 [inlined]
jl_gc_wait_for_the_world at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:242 [inlined]
ijl_gc_collect at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:3502    
maybe_collect at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:937 [inlined]
jl_gc_pool_alloc_inner at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:1293 [inlined]
jl_gc_pool_alloc_noinline at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gc.c:1350
jl_gc_alloc_ at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:477 [inlined]
_new_array_ at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/array.c:144 [inlined]
_new_array at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/array.c:198 [inlined]
ijl_alloc_array_3d at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/array.c:450
Array at ./boot.jl:481 [inlined]
Array at ./boot.jl:488 [inlined]
similar at ./array.jl:420 [inlined]
similar at ./abstractarray.jl:828 [inlined]
_unsafe_getindex at ./multidimensional.jl:901
_getindex at ./multidimensional.jl:889 [inlined]
getindex at ./abstractarray.jl:1288 [inlined]
macro expansion at /local/directory/deadlock.jl:12 [inlined]
#2#threadsfor_fun#2 at ./threadingconstructs.jl:214
#2#threadsfor_fun at ./threadingconstructs.jl:181 [inlined]
#1 at ./threadingconstructs.jl:153
unknown function (ip: 0x7fcb27c84562)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 
jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238      
unknown function (ip: (nil))
pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173
jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350
_IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
unknown function (ip: 0x7fcb27c7e9dc)
unknown function (ip: (nil))
pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173
jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350
_IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
ijl_process_events at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/jl_uv.c:277
unknown function (ip: (nil))
pthread_rwlock_wrlock at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
unknown function (ip: 0x7fcaee2f131b)
unknown function (ip: 0x7fcaedffca08)
unknown function (ip: 0x7fcaedffd5eb)
unknown function (ip: 0x7fcaee110946)
unknown function (ip: 0x7fca7440b04c)
unknown function (ip: 0x7fca744302de)
unknown function (ip: 0x7fca7443a879)
unknown function (ip: 0x7fca744272d9)
unknown function (ip: 0x7fca7442b6bc)
unknown function (ip: 0x7fca74455bda)
unknown function (ip: 0x7fca74408fe9)
cufftXtExecDescriptor at /home/username/.julia/artifacts/c0e6b8fff2621303ace1cc360b7fca676b4e28fd/lib/libcufft.so (unknown line)
cufftXtExec at /home/username/.julia/artifacts/c0e6b8fff2621303ace1cc360b7fca676b4e28fd/lib/libcufft.so (unknown line)
macro expansion at /path/to/local/CUDA.jl/lib/cufft/libcufft.jl:229 [inlined]
#46 at /path/to/local/CUDA.jl/lib/utils/call.jl:30 [inlined]
retry_reclaim at /path/to/local/CUDA.jl/src/pool.jl:370 [inlined]
check at /path/to/local/CUDA.jl/lib/cufft/libcufft.jl:18 [inlined]
cufftExecC2C at /path/to/local/CUDA.jl/lib/utils/call.jl:29
unsafe_execute! at /path/to/local/CUDA.jl/lib/cufft/fft.jl:332
unsafe_execute_trailing! at /path/to/local/CUDA.jl/lib/cufft/fft.jl:401
* at /path/to/local/CUDA.jl/lib/cufft/fft.jl:455 [inlined]
macro expansion at /local/directory/deadlock.jl:12 [inlined]
#2#threadsfor_fun#2 at ./threadingconstructs.jl:214
#2#threadsfor_fun at ./threadingconstructs.jl:181 [inlined]
#1 at ./threadingconstructs.jl:153
unknown function (ip: 0x7fcb27c84562)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076 
jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238      
unknown function (ip: (nil))
pthread_cond_wait at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:883
jl_safepoint_wait_gc at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/safepoint.c:173
jl_set_gc_and_wait at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_internal.h:956 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:363 [inlined]
segv_handler at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/signals-unix.c:350
_IO_funlockfile at /lib/x86_64-linux-gnu/libpthread.so.0 (unknown line)
jl_gc_state_set at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_threads.h:351 [inlined]
jl_gc_state_set at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia_threads.h:344 [inlined]
jl_gc_safe_leave at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/jlapi.c:465unknown function (ip: (nil))

==============================================================
Profile collected. A report will print at the next yield point
==============================================================

@maleadt
Copy link
Member

maleadt commented Feb 13, 2024

I extended the PR to cover all libraries, i.e., including cuFFT. Can you test again?

@AaronGhost
Copy link
Author

Problem fixed with the latest version!
Thank you very much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants