Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault when importing CUDA #2083

Closed
yuvalwas opened this issue Sep 19, 2023 · 6 comments
Closed

Segmentation fault when importing CUDA #2083

yuvalwas opened this issue Sep 19, 2023 · 6 comments
Labels
bug Something isn't working needs information Further information is requested

Comments

@yuvalwas
Copy link

yuvalwas commented Sep 19, 2023

Describe the bug

Hello, not sure if you'll consider this a bug. In the documentation about conditional use users are thought to always be able to import CUDA. However, when I import CUDA on a non-GPU server of my Institute's HPC, I get:


[4389] signal (11.1): Segmentation fault
in expression starting at /home/labs/tsodyks/yuvalw/clusterless/wexac_utils/env_setup.jl:5
__init__ at /home/labs/tsodyks/yuvalw/.julia/packages/CUDA/ZdCxS/src/initialization.jl:42
jfptr___init___3368 at /home/labs/tsodyks/yuvalw/.julia/compiled/v1.9/CUDA/oWw5k_lNa62.so (unknown line)
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/julia.h:1880 [inlined]
jl_module_run_initializer at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:75
ijl_init_restored_modules at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/module.c:982
register_restored_modules at ./loading.jl:1115
_include_from_serialized at ./loading.jl:1061
_require_search_from_serialized at ./loading.jl:1506
_require at ./loading.jl:1783
_require_prelocked at ./loading.jl:1660
macro expansion at ./loading.jl:1648 [inlined]
macro expansion at ./lock.jl:267 [inlined]
require at ./loading.jl:1611
jfptr_require_45889.clone_1 at /apps/easybd/easybuild/software/Julia/1.9.3-linux-x86_64/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/julia.h:1880 [inlined]
call_require at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:466 [inlined]
eval_import_path at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:503
jl_toplevel_eval_flex at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:731
eval_body at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:572
eval_body at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:533
jl_interpret_toplevel_thunk at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/interpreter.c:762
jl_toplevel_eval_flex at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:912
jl_toplevel_eval_flex at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:856
ijl_toplevel_eval_in at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/toplevel.c:971
eval at ./boot.jl:370 [inlined]
include_string at ./loading.jl:1903
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
_include at ./loading.jl:1963
include at ./Base.jl:457
jfptr_include_35036.clone_1 at /apps/easybd/easybuild/software/Julia/1.9.3-linux-x86_64/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
exec_options at ./client.jl:307
_start at ./client.jl:522
jfptr__start_40034.clone_1 at /apps/easybd/easybuild/software/Julia/1.9.3-linux-x86_64/lib/julia/sys.so (unknown line)
_jl_invoke at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2758 [inlined]
ijl_apply_generic at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/gf.c:2940
jl_apply at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/julia.h:1880 [inlined]
true_main at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/jlapi.c:573
jl_repl_entrypoint at /cache/build/default-amdci5-5/julialang/julia-release-1-dot-9/src/jlapi.c:717
main at julia (unknown line)
__libc_start_main at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x4010b8)
Allocations: 5143867 (Pool: 5143009; Big: 858); GC: 10
/scratch/1695121957.576635.shell: line 13:  4389 Segmentation fault      (core dumped) julia wexac_utils/env_setup.jl

To reproduce
To reproduce the above error I run using CUDA or

  try
    using CUDA
catch
end

which doesn't help.

Expected behavior

Hopefully not crash. At the moment, unaware of a better solution, I use a global flag to determine whether to import CUDA.

Manifest.toml

[[deps.CUDA]]
deps = ["AbstractFFTs", "Adapt", "BFloat16s", "CEnum", "CUDA_Driver_jll", "CUDA_Runtime_Discovery", "CUDA_Runtime_jll", "CompilerSupportLibraries_jll", "ExprTools", "GPUArrays", "GPUCompiler", "LLVM", "LazyArtifacts", "Libdl", "LinearAlgebra", "Logging", "Preferences", "Printf", "Random", "Random123", "RandomNumbers", "Reexport", "Requires", "SparseArrays", "SpecialFunctions"]
git-tree-sha1 = "edff14c60784c8f7191a62a23b15a421185bc8a8"
uuid = "052768ef-5323-5732-b1bb-66c8b64840ba"
version = "4.0.1"

[[deps.GPUArrays]]
deps = ["Adapt", "GPUArraysCore", "LLVM", "LinearAlgebra", "Printf", "Random", "Reexport", "Serialization", "Statistics"]
git-tree-sha1 = "2e57b4a4f9cc15e85a24d603256fe08e527f48d1"
uuid = "0c68f7d7-f131-5f86-a1c3-88cf8149b2d7"
version = "8.8.1"

[[deps.GPUArraysCore]]
deps = ["Adapt"]
git-tree-sha1 = "2d6ca471a6c7b536127afccfa7564b5b39227fe0"
uuid = "46192b85-c4d5-4398-a991-12ede77f4527"
version = "0.1.5"

[[deps.GPUCompiler]]
deps = ["ExprTools", "InteractiveUtils", "LLVM", "Libdl", "Logging", "TimerOutputs", "UUIDs"]
git-tree-sha1 = "19d693666a304e8c371798f4900f7435558c7cde"
uuid = "61eb1bfa-7361-4325-ad38-22787b887f55"
version = "0.17.3"

[[deps.LLVM]]
deps = ["CEnum", "LLVMExtra_jll", "Libdl", "Printf", "Unicode"]
git-tree-sha1 = "f044a2796a9e18e0531b9b3072b0019a61f264bc"
uuid = "929cbde3-209d-540e-8aea-75f648917ca0"
version = "4.17.1"

[[deps.LLVMExtra_jll]]
deps = ["Artifacts", "JLLWrappers", "LazyArtifacts", "Libdl", "TOML"]
git-tree-sha1 = "070e4b5b65827f82c16ae0916376cb47377aa1b5"
uuid = "dad2f222-ce93-54a1-a47d-0025e8a3acab"
version = "0.0.18+0"

[[deps.LLVMOpenMP_jll]]
deps = ["Artifacts", "JLLWrappers", "Libdl", "Pkg"]
git-tree-sha1 = "f689897ccbe049adb19a065c495e75f372ecd42b"
uuid = "1d63c593-3942-5779-bab2-d838dc0a180e"
version = "15.0.4+0"

Version info

Details on Julia:

Julia Version 1.9.3
Commit bed2cd540a1 (2023-08-24 14:43 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: 128 × AMD EPYC 7702 64-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-14.0.6 (ORCJIT, znver2)
  Threads: 1 on 128 virtual cores
Environment:
  JULIA_DEPOT_PATH = :
  LD_LIBRARY_PATH = /apps/easybd/easybuild/software/Julia/1.9.3-linux-x86_64/lib:/usr/share/lsf/10.1/linux3.10-glibc2.17-x86_64/lib:/home/labs/testing/almoga/tmp/ncbi-magicblast-1.4.0-src/c++/local/ncbi-vdb-2.9.0-1/lib64

Details on CUDA:

CUDA runtime 11.8, artifact installation
CUDA driver 12.0
NVIDIA driver 525.125.6

Libraries: 
- CUBLAS: 11.11.3
- CURAND: 10.3.0
- CUFFT: 10.9.0
- CUSOLVER: 11.4.1
- CUSPARSE: 11.7.5
- CUPTI: 18.0.0
- NVML: 12.0.0+525.125.6

Toolchain:
- Julia: 1.9.3
- LLVM: 14.0.6
- PTX ISA support: 3.2, 4.0, 4.1, 4.2, 4.3, 5.0, 6.0, 6.1, 6.3, 6.4, 6.5, 7.0, 7.1, 7.2, 7.3, 7.4, 7.5
- Device capability support: sm_35, sm_37, sm_50, sm_52, sm_53, sm_60, sm_61, sm_62, sm_70, sm_72, sm_75, sm_80, sm_86

1 device:
  0: Quadro RTX 8000 (sm_75, 47.449 GiB / 48.000 GiB available)

Additional context

This might be related to #1465.

@yuvalwas yuvalwas added the bug Something isn't working label Sep 19, 2023
@maleadt
Copy link
Member

maleadt commented Sep 19, 2023

This might be related to #1465.

That seems unlikely; why do you think so?

I'd rather suspect #1798 or so. In any case, you are using an old version of CUDA.jl, so please test again with v4.4 or v5

@maleadt maleadt added the needs information Further information is requested label Sep 19, 2023
@yuvalwas
Copy link
Author

That seems unlikely; why do you think so?

My comment was mainly based on ignorance, I just went over it when looking for related issues.

In any case, you are using an old version of CUDA.jl, so please test again with v4.4 or v5

I'm having problems with updating, perhaps you would know how to help?
At first, when I tried to update CUDA I got

pkg> update CUDA
    Updating registry at `C:\Users\yuvalw.WISMAIN\.julia\registries\General.toml`
ERROR: Unsatisfiable requirements detected for package GR_jll [d2c73de3]:
 GR_jll [d2c73de3] log:
 ├─possible versions are: 0.51.2-0.72.9 or uninstalled
 ├─restricted to versions 0.72.9 by an explicit requirement, leaving only versions: 0.72.9
 └─restricted by compatibility requirements with Qt6Base_jll [c0090381] to versions: 0.51.2-0.72.8 or uninstalled — no versions left
   └─Qt6Base_jll [c0090381] log:
     ├─possible versions are: 6.0.3-6.5.2 or uninstalled
     └─restricted to versions 6.5.2 by an explicit requirement, leaving only versions: 6.5.2

For some reason this doesn't show up anymore, but CUDA still won't update.

When I try to be more specific,

(Clusterless) pkg> add [email protected]
   Resolving package versions...
ERROR: Unsatisfiable requirements detected for package KernelAbstractions [63c18a36]:
 KernelAbstractions [63c18a36] log:
 ├─possible versions are: 0.1.0-0.9.8 or uninstalled
 ├─restricted to versions * by Clusterless [26ac66cf], leaving only versions: 0.1.0-0.9.8
 │ └─Clusterless [26ac66cf] log:
 │   ├─possible versions are: 0.1.0 or uninstalled
 │   └─Clusterless [26ac66cf] is fixed to version 0.1.0
 ├─restricted by compatibility requirements with CUDA [052768ef] to versions: 0.9.2-0.9.8
 │ └─CUDA [052768ef] log:
 │   ├─possible versions are: 0.1.0-5.0.0 or uninstalled
 │   ├─restricted to versions * by Clusterless [26ac66cf], leaving only versions: 0.1.0-5.0.0
 │   │ └─Clusterless [26ac66cf] log: see above
 │   └─restricted to versions 4.4 by an explicit requirement, leaving only versions: 4.4.0-4.4.1
 └─restricted by compatibility requirements with CUDAKernels [72cfdca4] to versions: 0.8.0-0.8.6 — no versions left
   └─CUDAKernels [72cfdca4] log:
     ├─possible versions are: 0.1.0-0.4.7 or uninstalled
     ├─restricted to versions * by Clusterless [26ac66cf], leaving only versions: 0.1.0-0.4.7
     │ └─Clusterless [26ac66cf] log: see above
     └─restricted by compatibility requirements with CUDA [052768ef] to versions: 0.4.5-0.4.7 or uninstalled, leaving only versions: 0.4.5-0.4.7       
       └─CUDA [052768ef] log: see above

(Clusterless) pkg> add CUDA@5
   Resolving package versions...
ERROR: Unsatisfiable requirements detected for package CUDAKernels [72cfdca4]:
 CUDAKernels [72cfdca4] log:
 ├─possible versions are: 0.1.0-0.4.7 or uninstalled
 ├─restricted to versions * by Clusterless [26ac66cf], leaving only versions: 0.1.0-0.4.7
 │ └─Clusterless [26ac66cf] log:
 │   ├─possible versions are: 0.1.0 or uninstalled
 │   └─Clusterless [26ac66cf] is fixed to version 0.1.0
 └─restricted by compatibility requirements with CUDA [052768ef] to versions: uninstalled — no versions left
   └─CUDA [052768ef] log:
     ├─possible versions are: 0.1.0-5.0.0 or uninstalled
     ├─restricted to versions * by Clusterless [26ac66cf], leaving only versions: 0.1.0-5.0.0
     │ └─Clusterless [26ac66cf] log: see above
     └─restricted to versions 5 by an explicit requirement, leaving only versions: 5.0.0

Thank you!

@maleadt
Copy link
Member

maleadt commented Sep 19, 2023

You could just try in a temporary, empty environment by using ]activate --temp. In there, there should be no problem installing the latest CUDA.jl.

@yuvalwas
Copy link
Author

You are right, there is no problem in a new environment with only CUDA v5.

@maleadt
Copy link
Member

maleadt commented Sep 19, 2023

OK, I'm going to assume that this is the same issue as #1798 then, which is fixed in v4.4 and v5.

To upgrade your environment to CUDA.jl v4.4, I think you need to get rid of the CUDAKernels dependency, as that is now provided by CUDA.jl (using CUDA.CUDAKernels).

@maleadt maleadt closed this as completed Sep 19, 2023
@yuvalwas
Copy link
Author

Yes, I just reached the same conclusion that the problem is in CUDAKernels. The only reason I have it installed is because KernelAbstractions and CUDAKernels are supposed to be loaded (If I understood correctly) for Tullio to use them. I'll remove it. Thank you for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs information Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants