Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial free threaded bindings #4421

Open
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

ngoldbaum
Copy link
Contributor

Refs #4265

  • Makes nox -s ffi-check pass on the free-threaded build
  • Adds a build config visible to PyO3 for the free-threaded build
  • Adds bindings for PyMutex (see https://docs.python.org/3.13/c-api/init.html#c.PyMutex)
  • Adds initial CI config with continue_on_error set for the test run.

noxfile.py Outdated
@@ -32,6 +33,7 @@
PYO3_DOCS_TARGET = PYO3_TARGET / "doc"
PY_VERSIONS = ("3.7", "3.8", "3.9", "3.10", "3.11", "3.12", "3.13")
PYPY_VERSIONS = ("3.7", "3.8", "3.9", "3.10")
IS_FREE_THREADED = bool(sysconfig.get_config_var("Py_GIL_DISABLED"))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure this is the "correct" way to do this in nox

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this makes it conditional on the host python, but really what we care about is the target python, I think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as far as our use of nox is concerned, these are the same. Is there an alternative?

It looks like the reason this was added is because we made the abi3 build on free-threaded a hard failure. Maybe we should relax that and then we don't need a change in the noxfile for free-threading?

"cargo:rustc-cfg=Py_3_12".to_owned(),
"cargo:rustc-cfg=Py_3_13".to_owned(),
"cargo:rustc-cfg=Py_GIL_DISABLED".to_owned(),
"cargo:rustc-cfg=py_sys_config=\"Py_GIL_DISABLED\"".to_owned(),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we define both configs in this case? Or only the top-level Py_GIL_DISABLED one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think let's go for just the top-level Py_GIL_DISABLED 👍. (You'll need to special-case this when printing the BuildFlags. as cfgs, I guess)

Please document in pyo3-build-config/lib.rs (in the docstring for use_pyo3_cfgs).

return _Py_IMMORTAL_REFCNT;
}
let shared = (AtomicIsize::from((*ob).ob_ref_shared)).load(Relaxed);
local as Py_ssize_t + Py_ssize_t::from(shared >> _Py_REF_SHARED_SHIFT)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only kind of hairy bit, since there's no function exposed in the API to get the refcount

This is my first time dealing with rust atomics. I have no idea if it's safe to create them inline and then do an atomic load like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AtomicU32::from((*ob).ob_ref_local) creates a new atomic whose changes will not be reflected in the original ob_ref_local. So this is definitely wrong. Atomic::from_ptr would be more appropriate.

Anyway I would feel better if we just used the C functions to do refcounting, similar to #4311 (comment). I'm sure @davidhewitt will have thoughts on this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately there is no function for this in the API, only a static inline function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #3357 we discussed maybe we should just remove this API.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still might need an implementation though if we need to expose it in the FFI for users of PyO3, e.g. as @davidhewitt implies in #3357 (comment).

My understanding is that the plan for 3.14 is to either expose a function for this or to expose a function that provides similar functionality without needing to expose refcounts directly. Something like Py_IsOnlyReference for refcount==1 and otherwise everything else becomes an implementation detail.

Unfortunately that doesn't help us for 3.13.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we probably do need to keep this API exposed (even after #4065 we use it for debugging and I think others might want to too). I guess that given it's a static inline function, let's just go down the Atomic::from_ptr route and hope that CPython can provide a C function in 3.14.

Comment on lines 1 to 13
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct PyMutex {
pub _bits: u8,
}
Copy link
Member

@mejrs mejrs Aug 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be wrong to derive Copy and/or Clone, the documentation explicitly says it should not be moved.

The Debug impl + not using atomics means the debug impl would race.

The field is also not public in the C api, I'm inclined to keep it that way here as well.

Suggested change
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct PyMutex {
pub _bits: u8,
}
#[repr(C)]
#[derive(Default)]
pub struct PyMutex {
_bits: u8,
_pin: PhantomPinned,
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately that won't work because PyObject needs those traits. At least if we want to say that ob_mutex is a PyObject as I have it in this version of the PR. Maybe better to write the PyObject bindings with ob_mutex as a u8. It's an internal, private CPython field and we wouldn't be using it otherwise.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should probably remove Copy & Clone from PyObject in that case, those traits sound to me like an accident waiting to happen anyway 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Default implementation seems a bit risky to me because of the existence of std::mem::take. I'm not sure _pin will save us from that unless we don't give users a way to get at &mut PyMutex, which e.g. they could just do with:

let mut m = PyMutex::default();
let moved = std::mem::take(&mut mutex);

This would also be a move-related bug:

let mut m = PyMutex::default();
PyMutex_Lock(&mut m);
let moved = m;

If we want to allow users to construct a PyMutex, I think we need to remove the Default implementation and have something like

impl PyMutex {
    pub fn new() -> Pin<Box<PyMutex>>
}

... but I also wonder whether that's overkill and we should just

  1. not bother with _pin, and
  2. not give users a way to construct PyMutex for now (we will need it inside the crate for PyObject_HEAD_INIT, but we can just make pub(crate) _bits).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure _pin will save us

The PhantomPinned itself doesn't do anything. It lets us build safe abstractions on top of it.

If we want to allow users to construct a PyMutex

We can remove the Default impl and document that users can create it with mem::zeroed, which is what CPython's documentation does.

This would also be a move-related bug:

A safe api would probably do something like this (to follow the example of having a PyMutex on the stack), which is possible because of the PhantomPinned.

let mutex: Pin<&mut PyMutex> = pin!(PyMutex {
     _bits: 0,
     _pin: PhantomPinned
 });
    
 lock(mutex.as_ref());

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately that won't work because PyObject needs those traits

@ngoldbaum we can still have a Debug impl, it just needs a manual impl that correctly accesses the PyMutex value (or, ideally it should just be an AtomicU8).

I agree with removing the Copy and Clone impls.

noxfile.py Outdated
@@ -32,6 +33,7 @@
PYO3_DOCS_TARGET = PYO3_TARGET / "doc"
PY_VERSIONS = ("3.7", "3.8", "3.9", "3.10", "3.11", "3.12", "3.13")
PYPY_VERSIONS = ("3.7", "3.8", "3.9", "3.10")
IS_FREE_THREADED = bool(sysconfig.get_config_var("Py_GIL_DISABLED"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this makes it conditional on the host python, but really what we care about is the target python, I think?

Comment on lines 330 to 332
if build_flags.0.contains(&BuildFlag::Py_GIL_DISABLED) && abi3 {
bail!("Cannot set Py_LIMITED_API and Py_GIL_DISABLED at the same time")
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in some cases we simply warn and disable Py_LIMITED_API (e.g., PyPy), do we have a principled reason to do one vs. the other?

Copy link
Contributor Author

@ngoldbaum ngoldbaum Aug 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In 3.13 Py_GIL_DISABLED implies Py_LIMITED_API isn't set and I wanted a way to enforce that. Warning and disabling the limited API gives the same effect.

I see there's early-exits in some of these functions for pypy and graal. I think can do something similar for free-threaded.

return _Py_IMMORTAL_REFCNT;
}
let shared = (AtomicIsize::from((*ob).ob_ref_shared)).load(Relaxed);
local as Py_ssize_t + Py_ssize_t::from(shared >> _Py_REF_SHARED_SHIFT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In #3357 we discussed maybe we should just remove this API.

Copy link
Member

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for helping out with this! Clearly some sticky cases, overall this is looking great already :)

return _Py_IMMORTAL_REFCNT;
}
let shared = (AtomicIsize::from((*ob).ob_ref_shared)).load(Relaxed);
local as Py_ssize_t + Py_ssize_t::from(shared >> _Py_REF_SHARED_SHIFT)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that we probably do need to keep this API exposed (even after #4065 we use it for debugging and I think others might want to too). I guess that given it's a static inline function, let's just go down the Atomic::from_ptr route and hope that CPython can provide a C function in 3.14.

Comment on lines 1 to 13
#[repr(C)]
#[derive(Debug, Copy, Clone)]
pub struct PyMutex {
pub _bits: u8,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Default implementation seems a bit risky to me because of the existence of std::mem::take. I'm not sure _pin will save us from that unless we don't give users a way to get at &mut PyMutex, which e.g. they could just do with:

let mut m = PyMutex::default();
let moved = std::mem::take(&mut mutex);

This would also be a move-related bug:

let mut m = PyMutex::default();
PyMutex_Lock(&mut m);
let moved = m;

If we want to allow users to construct a PyMutex, I think we need to remove the Default implementation and have something like

impl PyMutex {
    pub fn new() -> Pin<Box<PyMutex>>
}

... but I also wonder whether that's overkill and we should just

  1. not bother with _pin, and
  2. not give users a way to construct PyMutex for now (we will need it inside the crate for PyObject_HEAD_INIT, but we can just make pub(crate) _bits).

pyo3-ffi/src/cpython/weakrefobject.rs Outdated Show resolved Hide resolved
pyo3-ffi/src/object.rs Outdated Show resolved Hide resolved
pyo3-ffi/src/object.rs Outdated Show resolved Hide resolved
@@ -507,8 +557,13 @@ extern "C" {

#[inline(always)]
pub unsafe fn Py_INCREF(op: *mut PyObject) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Potentially can we leave a TODO or fixme to use inline reference counting on freethreaded builds? (Here and in Py_DECREF.)

If it helps us eke out a tiny bit of performance, it might be a nice thing to do eventually.

noxfile.py Outdated
@@ -32,6 +33,7 @@
PYO3_DOCS_TARGET = PYO3_TARGET / "doc"
PY_VERSIONS = ("3.7", "3.8", "3.9", "3.10", "3.11", "3.12", "3.13")
PYPY_VERSIONS = ("3.7", "3.8", "3.9", "3.10")
IS_FREE_THREADED = bool(sysconfig.get_config_var("Py_GIL_DISABLED"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as far as our use of nox is concerned, these are the same. Is there an alternative?

It looks like the reason this was added is because we made the abi3 build on free-threaded a hard failure. Maybe we should relax that and then we don't need a change in the noxfile for free-threading?

"cargo:rustc-cfg=Py_3_12".to_owned(),
"cargo:rustc-cfg=Py_3_13".to_owned(),
"cargo:rustc-cfg=Py_GIL_DISABLED".to_owned(),
"cargo:rustc-cfg=py_sys_config=\"Py_GIL_DISABLED\"".to_owned(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think let's go for just the top-level Py_GIL_DISABLED 👍. (You'll need to special-case this when printing the BuildFlags. as cfgs, I guess)

Please document in pyo3-build-config/lib.rs (in the docstring for use_pyo3_cfgs).

@mejrs
Copy link
Member

mejrs commented Aug 9, 2024

I've been giving it some thought and I think we really should use atomic integers (in PyMutex and the relevant refcount fields).

The code to correctly access them is rather error prone. Suppose a user wants to read the vvalue of a PyObject's PyMutex field.

They will have to write (and figure out that this is how they have to write it):

let x: *mut PyObject= ...;
let value = AtomicU8::from_ptr(addr_of_mut!((*x).ob_mutex)).load(Ordering::Relaxed);

rather than

let value = *x.ob_mutex;

and if they want to write to it they must also avoid creating mutable references, which is another poorly known footgun.

@ngoldbaum
Copy link
Contributor Author

@colesbury I suspect you might have an opinion about how to wrap the new PyObject fields in PyO3 and whether or not using rust atomics in the PyObject bindings makes sense.

@colesbury
Copy link

I've been giving it some thought and I think we really should use atomic integers (in PyMutex and the relevant refcount fields)

Yeah, I think using atomic integers for the fields is a good idea.

I'm not sure if there's much value for exposing PyMutex in PyO3. The ob_mutex field shouldn't be locked directly. If you want to lock a Python object, you should use the critical section API, which takes a PyObject* argument.

For more generic locking, you can use PyMutex or Rust locking primitives. The only advantage of PyMutex is that it saves the thread-state before blocking if it needs to wait.

@davidhewitt
Copy link
Member

Note that using atomic values for the fields probably means we want to consider portable_atomic as per #3619, but I think that crate uses a global lock as a fallback which is actually unsound here given the FFI.

Alternatively on platforms where std doesn't have atomic types we could just define our own opaque wrapper which users cannot access.

@ngoldbaum
Copy link
Contributor Author

ngoldbaum commented Aug 9, 2024

I think CPython requires atomics support. Are there use-cases for a version of pyo3 on hosts where CPython can't be compiled?

@colesbury
Copy link

Are there platforms where std doesn't have any atomic types? #3614 sounds like an issue using 64-bit atomic field on a 32-bit PowerPC. None of the PyObject fields are 64-bits on 32-bit platforms.

@davidhewitt
Copy link
Member

That's a good question; I wouldn't know the answer to that without looking through the rust source tree (and maybe others could define their own platforms). In practice it may not be a concern we have to worry about for these cases. We could punt on worrying about that until someone actually reports an issue for their platform, I guess 👀

@davidhewitt
Copy link
Member

I think CPython requires atomics support. Are there use-cases for a version of pyo3 on hosts where CPython can't be compiled?

Ah just seen this - in that case I think we can assume using atomics here is fine 👍

@ngoldbaum
Copy link
Contributor Author

It turns out there's one more issue adding PyMutex to the FFI bindings reveals: PyObject_HEAD_INIT should maybe be a function instead of a constant? It being a constant is problematic if one of the fields has interior mutability.

@ngoldbaum
Copy link
Contributor Author

ngoldbaum commented Aug 14, 2024

It turns out, for whatever reason, only a debug free-threaded python build successfully runs the pyo3 tests, at least on my Mac's development environment using pyenv. See python/cpython#122918 (comment). Just sharing in case anyone else wants to try this out and sees mysterious crashes with tracebacks that include Py_InitializeEx.

#[cfg(Py_GIL_DISABLED)]
ob_gc_bits: 0,
#[cfg(Py_GIL_DISABLED)]
ob_ref_local: AtomicU32::new(0),

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be immortal (u32::MAX) I think

#[repr(C)]
pub struct PyMutex {
_bits: AtomicU8,
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at Rust's built-in mutex types, I think we'd want a constructor like:

impl PyMutex {
    pub const fn new() -> PyMutex {
        PyMutex { _bits: AtomicU8::new(0) }
    }
}

#[cfg(Py_GIL_DISABLED)]
_padding: 0,
#[cfg(Py_GIL_DISABLED)]
ob_mutex: unsafe { mem::zeroed::<PyMutex>() },

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be PyMutex::new() if we define it as above.

That avoids the unsafe block. And maybe allows PyObject_HEAD_INIT to remain a constant? Are there other fields with interior mutability?

@ngoldbaum
Copy link
Contributor Author

ngoldbaum commented Aug 14, 2024

Hmm, maybe this is what @davidhewitt is referring to above in #4421 (comment) with the bit about pub(crate) _bits. Let me see if I can get that working...

@ngoldbaum
Copy link
Contributor Author

ngoldbaum commented Aug 14, 2024

It looks like the lint triggers because of the atomic fields, so declaring the PyMutex field as pub(crate) is related but unfortunately doesn't fix the complaint about interior mutability.

@mejrs
Copy link
Member

mejrs commented Aug 14, 2024

Can we not just silence the lint? The copy is intentional.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants