Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dlopen("libcudart") results in duplicate libraries #1814

Closed
maleadt opened this issue Mar 20, 2023 · 7 comments
Closed

dlopen("libcudart") results in duplicate libraries #1814

maleadt opened this issue Mar 20, 2023 · 7 comments
Labels
bug Something isn't working installation CUDA is easy to install, right?

Comments

@maleadt
Copy link
Member

maleadt commented Mar 20, 2023

julia> using CUDA_Runtime_jll

julia> using Libdl
julia> filter(lib->occursin("cuda", lib), Libdl.dllist())
3-element Vector{String}:
 "/home/tim/Julia/depot/artifacts" ⋯ 25 bytes ⋯ "fc058e42039b075f/lib/libcuda.so"
 "/opt/cuda/lib64/libnvrtc.so"
 "/home/tim/Julia/depot/artifacts" ⋯ 27 bytes ⋯ "a005cf1a3deb91/lib/libcudart.so"

julia> Libdl.dlopen("libcudart")
Ptr{Nothing} @0x0000000001cc61f0

julia> filter(lib->occursin("cuda", lib), Libdl.dllist())
4-element Vector{String}:
 "/home/tim/Julia/depot/artifacts" ⋯ 25 bytes ⋯ "fc058e42039b075f/lib/libcuda.so"
 "/opt/cuda/lib64/libnvrtc.so"
 "/home/tim/Julia/depot/artifacts" ⋯ 27 bytes ⋯ "a005cf1a3deb91/lib/libcudart.so"
 "/opt/cuda/lib64/libcudart.so"

@staticfloat This is very surprising to me, and seems to break the whole premise of JLLs eagerly dlopening libraries so that they are discoverable afterwards without mucking with the library path. Any thoughts?

cc @vchuravy

@maleadt maleadt added the bug Something isn't working label Mar 20, 2023
@maleadt
Copy link
Member Author

maleadt commented Mar 20, 2023

LD_DEBUG=libs,file

julia> Libdl.dlopen("libcudart")
     14709:
     14709:	file=libcudart [0];  dynamically loaded by /home/tim/Julia/depot/juliaup/julia-1.8.5+0.x64.linux.gnu/bin/../lib/julia/libjulia-internal.so.1 [0]
     14709:	find library=libcudart [0]; searching
     14709:	 search path=/home/tim/Julia/depot/juliaup/julia-1.8.5+0.x64.linux.gnu/bin/../lib/julia:/home/tim/Julia/depot/juliaup/julia-1.8.5+0.x64.linux.gnu/bin/../lib/julia/..		(RUNPATH from file /home/tim/Julia/depot/juliaup/julia-1.8.5+0.x64.linux.gnu/bin/julia)
     14709:	  trying file=/home/tim/Julia/depot/juliaup/julia-1.8.5+0.x64.linux.gnu/bin/../lib/julia/libcudart
     14709:	  trying file=/home/tim/Julia/depot/juliaup/julia-1.8.5+0.x64.linux.gnu/bin/../lib/julia/../libcudart
     14709:	 search cache=/etc/ld.so.cache
     14709:	 search path=/usr/lib		(system search path)
     14709:	  trying file=/usr/lib/libcudart

So the dynamic linker knows about the copy we've loaded, but doesn't use it.

@maleadt
Copy link
Member Author

maleadt commented Mar 20, 2023

SONAMEs:

❯ objdump -p /home/tim/Julia/depot/artifacts/28605e58122d8c44f2ec2875aaa005cf1a3deb91/lib/libcudart.so | grep SONAME
  SONAME               libcudart.so.12

❯ objdump -p /opt/cuda/lib64/libcudart.so | grep SONAME
  SONAME               libcudart.so.12

@maleadt maleadt added the installation CUDA is easy to install, right? label Mar 20, 2023
@maleadt
Copy link
Member Author

maleadt commented Mar 20, 2023

Ah, the reason is that the JLLs dlopen with the full soname, libcudart.so.12, while I was dlopening with the shorter soname. Specifying the full SONAME makes the dynamic loader return the loaded library:

julia> using CUDA_Runtime_jll, Libdl
[ Info: Precompiling CUDA_Runtime_jll [76a88914-d11a-5bdc-97e0-2f5a05c973a2]

julia> println.(filter(lib->occursin("cuda", lib), Libdl.dllist()))
/usr/lib/libcuda.so.1
/opt/cuda/lib64/libnvrtc.so
/home/tim/Julia/depot/artifacts/28605e58122d8c44f2ec2875aaa005cf1a3deb91/lib/libcudart.so
3-element Vector{Nothing}:
 nothing
 nothing
 nothing

julia> Libdl.dlpath(Libdl.dlopen("libcudart.so.12"))
"/home/tim/Julia/depot/artifacts/28605e58122d8c44f2ec2875aaa005cf1a3deb91/lib/libcudart.so"

julia> println.(filter(lib->occursin("cuda", lib), Libdl.dllist()))
/usr/lib/libcuda.so.1
/opt/cuda/lib64/libnvrtc.so
/home/tim/Julia/depot/artifacts/28605e58122d8c44f2ec2875aaa005cf1a3deb91/lib/libcudart.so
3-element Vector{Nothing}:
 nothing
 nothing
 nothing

This is still surprising to me though, because we're now loading an identically-named copy (i.e, with exactly the same SONAME) from a library that's already been loaded.

@maleadt
Copy link
Member Author

maleadt commented Mar 20, 2023

So in summary:

julia> using Libdl

julia> Libdl.dlopen("/home/tim/Julia/depot/artifacts/28605e58122d8c44f2ec2875aaa005cf1a3deb91/lib/libcudart.so")
Ptr{Nothing} @0x00000000017a5020

julia> Libdl.dlpath(Libdl.dlopen("libcudart.so"))
"/opt/cuda/lib64/libcudart.so"

vs

julia> using Libdl

julia> Libdl.dlopen(/home/tim/Julia/depot/artifacts/28605e58122d8c44f2ec2875aaa005cf1a3deb91/lib/libcudart.soc^C

julia> Libdl.dlpath(Libdl.dlopen("libcudart.so.12"))
"/home/tim/Julia/depot/artifacts/28605e58122d8c44f2ec2875aaa005cf1a3deb91/lib/libcudart.so"

@staticfloat
Copy link
Contributor

Yes; in general, you have to use the SONAME on Linux, which is why all of the BB-built JLLs embed the SONAME in the source code. On macOS you have to use the "dylib ID", which we rewrite to look like @rpath/libfoo.2.dylib. On Windows, you just use the basename.

@maleadt
Copy link
Member Author

maleadt commented Mar 21, 2023

The problem is that this is incompatible with existing software that does dlopen("libcudart"). For stuff we build in Yggdrasil we either link directly or patch the dlopen, as @vchuravy had to do with PAPI, but for interop with e.g. Python packages that use CUDA this is a problem

@staticfloat
Copy link
Contributor

One way that I’ve solved this in the past is to have symlinks with no versioning in the name that point to the main library, then ensure that that symlink is on the library path.

But this then gets load-order dependent, which is not very fun.

@maleadt maleadt closed this as completed Nov 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working installation CUDA is easy to install, right?
Projects
None yet
Development

No branches or pull requests

2 participants