Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] GIL lock failure during static initialization of pybind11 from get_internals() within PYBIND11_MODULE macro #2909

Closed
jrmadsen opened this issue Mar 17, 2021 · 6 comments

Comments

@jrmadsen
Copy link

Issue description

Essentially, within Travis CI on macOS builds, any attempt to import my pybind11 module results in:

Fatal Python error: PyMUTEX_LOCK(gil->mutex) failed
Python runtime state: unknown

I believe I've tracked it down to approximately

struct gil_scoped_acquire_local {

The detailed description is at NERSC/timemory#126.

Reproducible example code

I've used pybind11 many, many times in the past and I've never seen this before. I cannot reproduce it on my local macOS, only during CI. If I could reproduce it, I'd fix it myself and create a PR but I can't even get sanitizers to work:

==8692==ERROR: Interceptors are not working. This may be because AddressSanitizer is loaded too late (e.g. via dlopen). Please launch the executable with:
DYLD_INSERT_LIBRARIES=/Applications/Xcode-12.for.macOS.Universal.Apps.beta.2.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/lib/clang/12.0.0/lib/darwin/libclang_rt.asan_osx_dynamic.dylib
"interceptors not installed" && 0

I've tried using the above env but I'm pretty sure its being ignored by macOS because of their security policies.

I've noticed that pybind11_add_module adds -fvisibility=hidden, what is the reasoning behind this? Is it just for faster loading and protection against two modules using the different versions of the same TPL? Or does is also protect provide something important for the pybind11 internals? I ask bc I've had to bypass that bc of a TPL indirectly linked in not working from Python because of it.

Any insight would be helpful.

@wgledbetter
Copy link

Here's a minimal repo that produces the same error for me on an AWS mac instance:
https://github.com/wgledbetter/gil

And here's a stackoverflow question about the same issue:
https://stackoverflow.com/questions/66026520/fatal-python-error-pymutex-lock-pyruntime-ceval-gil-mutex-failed

@rwgk
Copy link
Collaborator

rwgk commented Mar 22, 2021

I'm not an expert on this, just trying to make a connection in hopes that the comments here will help with the background: #2479

I think this will get resolved only if an affected person digs in and figures out a solution (PR).

@jrmadsen
Copy link
Author

jrmadsen commented Mar 22, 2021

Thanks @wgledbetter. Unfortunately, adding -undefined dynamic_lookup to the link flags did not resolve the issue. I am starting to think that this may be a subtle ABI issue. Possibly between the the libcpp which is being installed via conda-forge and the libcpp being used when compiling with apple's clang bc pybind11 details::get_internals() does a lookup on a "id" string which is partially generated from:

/// Also standard libs
#ifndef PYBIND11_STDLIB
#  if defined(_LIBCPP_VERSION)
#    define PYBIND11_STDLIB "_libcpp"
#  elif defined(__GLIBCXX__) || defined(__GLIBCPP__)
#    define PYBIND11_STDLIB "_libstdcpp"
#  else
#    define PYBIND11_STDLIB ""
#  endif
#endif

so another package could be using pybind11, registering the same "id", and later when I try to import, this condition is satisfied: if (builtins.contains(id) && isinstance<capsule>(builtins[id])) but the data loaded by pointer is ABI-incompatible, resulting in a bad lookup on the lock state or something along those line.

@rwgk Although I am now pretty confident that my issue is not associated with symbol visibility, when I figure this out, I'll see what I can do about adding an option to pybind11_add_module which allows one to specify not adding -fvisibility=hidden to the compiler flags (similar to how THIN_LTO works)

@rwgk
Copy link
Collaborator

rwgk commented Mar 22, 2021

Although I am now pretty confident that my issue is not associated with symbol visibility, when I figure this out, I'll see what I can do about adding an option to pybind11_add_module which allows one to specify not adding -fvisibility=hidden to the compiler flags (similar to how THIN_LTO works)

Sounds good. Just to make sure, are you already aware of this?

// Robust support for some features and loading modules compiled against different pybind versions

TBH I don't have a full understanding of how this interacts with -fvisibility=hidden. If you feel you need more background, please let me know here. If we have a focused question, I could try to get Wenzel's attention for enlightenment.

@jrmadsen
Copy link
Author

jrmadsen commented Mar 22, 2021

Ah yeah, I found that very recently.

TBH I don't have a full understanding of how this interacts with -fvisibility=hidden

Adding -fvisibility=hidden is redundant w.r.t. pybind11 internals if the compiler supports that attribute. The side-effect of the compiler flags is that if A links to libfoo.so and B links to libfoo.so, and libfoo.so has some global state data (in my case, the call-stack of the profiler measurements) then, unless libfoo marks those global symbols with __attribute__((visibility("default"))), the compiler flag causes A and B each have their own copy of those symbols. In my case, I provide hooks into other profilers which are mostly compiled into libtimemory and libtimemory is linked to libpytimemory (the python bindings) so any calls into another profiler which are directly linked into the python bindings reference a set of symbols in the python bindings while any calls into the other profiler done within libtimemory reference the set of symbols in that library.

@jrmadsen
Copy link
Author

Shockingly this was fixed by not linking to ${PYTHON_LIBRARIES}, e.g.

target_link_libraries(<target> ${PYTHON_LIBRARIES})

was changed to

if(NOT APPLE)
    target_link_libraries(<target> ${PYTHON_LIBRARIES})
endif()

ddanderson added a commit to wiredtiger/wiredtiger that referenced this issue Apr 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants