Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] Deep O(1) type-checking of @dataclasses.dataclass fields on assignment #391

Open
GithubCamouflaged opened this issue May 28, 2024 · 5 comments

Comments

@GithubCamouflaged
Copy link

GithubCamouflaged commented May 28, 2024

Hi,

Very much taken by your library's functionality, especially the 'wire-once-and-forget' setup: much appreciated!

I have a gut-feeling this is a noob question, but either I don't have a solid grasp of the functionality offered by your library, or I cannot get seemingly basic functionality to work, that is: type enforcement of object attributes.

Going by this very simple implementation, I can verify type enforcement is performed on method calls, as configured in __init__.py per the documentation.

main.py:

from package.foo import Foo
obj = Foo(bar=0)

package/foo.py:

from dataclasses import dataclass

@dataclass
class Foo():
   bar:str = 'default'

Resulting in an (expected) exception on the generated Foo.__init__(bar:str='default') method (courtesy of Python dataclasses):

.venvuser@localhost:~/test$ python3 main.py 
Traceback (most recent call last):
  File "/home/user/test/main.py", line 2, in <module>
    obj = Foo(bar=0)
          ^^^^^^^^^^
  File "<@beartype(package.foo.Foo.__init__) at 0x7f197e047d80>", line 29, in __init__
beartype.roar.BeartypeCallHintParamViolation: Method package.foo.Foo.__init__() parameter bar=0 violates type hint <class 'str'>, as int 0 not instance of str.

However, changing main.py as follows does not yield the same result, as I would expect by said documentation stating Beartype now implicitly type-checks all annotated classes, callables, and variable assignments:

from cbr.foo import Foo
obj = Foo(bar='bar')
print(f"obj.bar: {obj.bar} (type={type(obj.bar)}).")
obj.bar = []    # <- I would expect this to yield an exception (list is assigned to ensure type is not coerced).
print(f"obj.bar: {obj.bar} (type={type(obj.bar)}).")    # <- This is certainly not in line with the type hint on the attribute.

However, by now I am getting the sneaky suspicion I am doing something fundamentally wrong, as I also cannot get this example to work either, changing the content of package/foo.py as follows, yielding the same output as above:

from dataclasses import dataclass
from beartype.vale import IsInstance
from typing import Annotated

IsUnicodeStrWithBeartype = Annotated[object, IsInstance[str]]

@dataclass
class Foo():
    bar:IsUnicodeStrWithBeartype = 'default'

I've also attempted to decorate with beartype (heading the warning about decorator order) and use the different BeartypeStrategys to no avail.

So... are my expectations wrong, or am I mishandling your library somehow?

@leycec leycec changed the title Question: type hints for attributes/variables. [Feature Request] Deep O(1) type-checking of @dataclasses.dataclass fields on assignment May 29, 2024
@leycec
Copy link
Member

leycec commented May 29, 2024

tl;dr

The tl;dr is that you are not wrong. You're very much right. @beartype currently lacks support for deep O(1) (i.e., constant-time) type-checking of @dataclasses.dataclass fields on assignment. Why? Because @leycec lazily plays Japanese role-playing games instead of coding. This is why our very Universe is in peril.

All hope is not lost, however. @leycec will surely stop playing Japanese role-playing games someday and actually code something. Until that fateful day, you can do what @leycec should have been doing all along...

Behold! A Poor Man's @dataclasses.dataclass Field Assignment Type-checker!

Believe it or not, it's true:

from beartype import beartype
from beartype.door import (
    die_if_unbearable,
    is_bearable,
)
from dataclasses import dataclass

@beartype
@dataclass
class Foo():
    bar: str = 'default'

    def __setattr__(self, attr_name: str, attr_value: object) -> None:
        '''
        Set the dataclass attribute with the passed name to the passed value.

        Parameters
        ----------
        attr_name : str
            Name of the attribute to be set.
        attr_value : object
            Value to set this attribute to.
        '''

        # Field (i.e., "dataclass.Field" object) with this name if this
        # attribute is that of an existing field *OR* "None" otherwise.
        attr_field = self.__dataclass_fields__.get(attr_name)

        # If...
        if (
            # This field exists *AND*...
            attr_field is not None and
            # This field is annotated by a type hint...
            attr_field.type is not None
        ):
            #FIXME: Refactor both the is_bearable() and die_if_unbearable()
            #functions to accept an optional "cls_stack" parameter. For now,
            #edge-case type hints requiring this parameter will fail. *sigh*
            #
            #See also this relevant open issue:
            #    https://github.com/beartype/beartype/issues/364
            # # Type stack encapsulating this dataclass instance, required to
            # # validate edge-case type hints (e.g., "typing.Self", relative
            # # self-referential forward references).
            # cls_stack = (type(self),)

            # If the new passed value to be assigned to this field violates this
            # type hint, raise an exception.
            #
            # Note that the is_bearable() function is considerably faster than
            # the die_if_unbearable() function and thus called first. 
            if not is_bearable(attr_value, attr_field.type):
                die_if_unbearable(
                    obj=attr_value,
                    hint=attr_field.type,

                    # Human-readable substring prefixing the exception message
                    # of this type-checking violation to be raised.
                    exception_prefix=(
                        f'Dataclass instance {repr(self)} '
                        f'field "{attr_name}" '
                    )
                )
            # Else, this value satisfies this type hint.
        # Else, this field does *NOT* exist. This may or may *NOT* constitute an
        # error. In either case, allow the __setattr__() dunder method called
        # below to validate this issue.

        # Set this attribute by deferring to the superclass implementation.
        super().__setattr__(attr_name, attr_value)

obj = Foo(bar='bar')
obj.bar = 'This way something wickedly typing comes.'
print(obj.bar)
obj.bar = []
print(obj.bar)

...which prints:

This way something wickedly typing comes.
Traceback (most recent call last):
  File "/home/leycec/tmp/mopy.py", line 74, in <module>
    obj.bar = []
    ^^^^^^^
  File "<@beartype(__main__.Foo.__setattr__) at 0x7ff8c580e520>", line 32, in __setattr__
  File "/home/leycec/tmp/mopy.py", line 52, in __setattr__
    die_if_unbearable(
  File "/home/leycec/py/beartype/beartype/door/_doorcheck.py", line 106, in die_if_unbearable
    func_raiser(obj)  # pyright: ignore[reportUnboundVariable]
    ^^^^^^^^^^^^^^^^
  File "<@beartype(__beartype_checker_1) at 0x5653431facd0>", line 20, in __beartype_checker_1
beartype.roar.BeartypeDoorHintViolation: Dataclass instance
Foo(bar='This way something wickedly typing comes.') field "bar" value [] violates type hint
<class 'str'>, as list [] not instance of str.

"why. why oh gods why did i choose @beartype."

You may now be thinking that. While you (...and your codebase) weep, I chortle. I've been both ruminating and dreading this very topic for literally years now. Nobody else cared. Yet I knew that someone would care, someday. You are that someone, @GithubCamouflaged. This is that day.

To be sure, it's an exciting topic; full-blown @dataclasses.dataclass integration pushes @beartype into peer-parity with Pydantic. Go for the throat, @beartype! </grrrrrrrrrrrrrr>

That said, it's also a brutal topic. There's a reason that (...basically) the only industrial-scale dataclass validation framework that anyone would actually want to use in production is Pydantic. There's enormous incentive to want to go there. Therefore, @beartype should go there. Still, going there will necessitate at least a month (...but probably more, if I'm being honest) of full-time open-source volunteerism.

The central issue here is that @beartype absolutely must not break backward compatibility. @beartype has been integrated into PyTorch and literally everything else at this point. Preserving compatibility is paramount. Sadly, preserving compatibility here will (...probably) prove to be super non-trivial. What follows is mostly a "list of gotchas" to myself:

  • Frozen dataclasses (i.e., @dataclass(frozen=True)). Since frozen dataclasses also internally leverage the __setattr__() dunder method, @beartype must take care to ignore frozen dataclasses. I don't even particularly know how to portably detect frozen dataclasses without violating privacy encapsulation at the moment. 😮‍💨
  • Existing __setattr__() implementations. For similar reasons, @beartype must take care to preserve existing __setattr__() implementations in third-party dataclasses. Malicious God of Monkey-Patching, I invoke thee! 🧙
  • PEP 591 support (i.e., typing.Final[...]). @beartype's __setattr__() implementation must explicitly raise type-checking violations on erroneous attempts to reassign fields annotated by typing.Final[...].
  • typing.ClassVar[...]. Possibly not a concern. No idea. In any case, @beartype's __setattr__() implementation may need to additionally detect attempts to type-check class fields annotated by typing.ClassVar[...]. It's unclear to me at the moment exactly what this entails. Ignorance, I dispel thee! 🪄
  • typing.InitVar[...]. Possibly not a concern. No idea. In any case, @beartype's __setattr__() implementation must silently ignore all fields annotated by typing.InitVar[...].
  • is_bearable() and die_if_unbearable(). As commented above, both of these statement-level runtime type-checkers need to be augmented to optionally accept a new cls_stack: TypeStack = None parameter. Until we do so, @beartype's __setattr__() implementation will raise spurious (i.e., bad) exceptions on attempting to assign fields annotated by edge-case type hints like typing.Self and relative self-referential forward references (e.g., muh_field: 'Foo').

Probably lots of other tedious and boring minutiae I'm neglecting, too. This is Pydantic territory. Muscling in on Pydantic's home turf was never gonna be easy. But @beartype isn't about easy. @beartype is about doing what's hot and messy and recklessly endangering @leycec's tenuous grip on sanity for cheap and easy Internet kudos! Yeah!


Left: Try-Hard Beartype. Right: probably Pydantic.

@GithubCamouflaged
Copy link
Author

Hi @leycec,

A sad day indeed to be validated (some light-hearted pun intended) in my believes! However, very kind of you to sacrifice valued gaming time, mixed with some of your endless supply of wit, to provide more background and even a poor-persons solution ;) Albeit tempting, it does indeed make me weary to put this to production...

That is not to say, I am not a believer! The question therefor becomes: how do we approach this quest?

You've already identified some potential showstoppers that we can break down and tackle. The most challenging of this, to me, seems (backward) compatibility. Am I naive in thinking your unit tests would cover this, or would this require additional integration testing with dependants (sadly, this link seems down atm)? I am trying to gauge what is required of a (representative) development environment to work on this, and cover this risk. Additionally: is there documentation available for contributors? P.S.: honesty dictates I must admit such enthusiasm in me tends to wane over time, just to set some expectations on my involvement in this noble quest...

To clarify, I feel 'muscling in' serves a valid purpose here: to me (but correct me if I am wrong) there are different design philosophies to beartype and Pydantic. One is being loosely-coupled/implicit (if you use wire-and-forget in __init__.py) versus being tightly-coupling/explicit (requiring subclassing BaseModel or using a custom dataclass). The other is separating responsibilities: enforcing type(hint)s has, in my opinion, nothing to do with validation of values (a place where libraries like icontract and Deal shine), nor serialization (want XML as well? More tight-coupling for you my friend). I won't dispute Pydantic is impressive, battle-tested, and widely used, but it does require locking into their ecosystem. To me, beartype would provide an alternative to avoid this 'vendor lock-in'.

@leycec
Copy link
Member

leycec commented May 31, 2024

Fetch side quest intensifies. Let's RPG this.

it does indeed make me weary to put this to production...

Indeed, your wariness is justified. Production deserves better. When even the package maintainer is begging you to "...don't do it, by all the Gods!", the thin ice you're skating on is even thinner than GitHub Copilot thought. And GitHub Copilot will do anything!

😄 -> ☠️

honesty dictates I must admit such enthusiasm in me tends to wane over time

...heh. You're too nice. You also don't want to go here. Deep down, you know the @beartype codebase to be an exhausting pit of doom you won't soon recover from. Nobody deep-dives into @beartype more than once. Most, not even once. There are reasons for this.

That said, preserving backward compatibility for the moment is mostly trivial. Hear me out. I am boring; yet you feel compelled to listen. We:

  • Expose a new optional configuration option BeartypeConf(is_check_pep557_assign: bool = False) governing this feature.

Since this option is disabled by default and probably undocumented for years to come, basically nobody except you will ever enable this option. When enough years depressingly fly past like time itself dilating into the voracious black hole that is a metaphor for humanity's fleeting mortality, we flip this default from False to True under the (...probably disastrously incorrect) assumption that this feature now behaves as advertised and fully preserves backward compatibility.

I'm Listening and I Kinda Like What I'm Hearing... Kinda

Right? Exciting plan, right? Here's how we start:

  • Everybody either patiently waits a few months for @leycec to finish his laborious 200-hour playthrough of "Persona 3: Reload" or...
  • Somebody who is not me submits one or more PRs! In order, these things would benefit from being done:
    1. Define BeartypeConf(is_check_pep557_assign: bool = False). This should be mostly trivial. I'm joking, of course. It will be Hell. Subtasks include:

      1. In the private beartype._conf.confcls submodule, introduce the is_check_pep557_assign option.
      2. In the private beartype._conf.conftest submodule, validate this option in the existing die_if_conf_kwargs_invalid() function.
      3. In the test-specific beartype_test.a00_unit.a40_api.conf.test_confcls submodule, test that this option superficially behaves as expected.
    2. Actually do something useful with this option. Let's be brutal. This is definitely less trivial. Thankfully, this option is disabled by default. We were wise. Our initial implementation of this option no longer needs to be perfect. It just has to work "good enough" to cover basic use cases. I submit without evidence that the __setattr__() dunder method defined above vaguely satisfies certain hand-wavy definitions of "good enough." Subtasks include:

      1. In the private beartype._decortype submodule, refactor the existing beartype_type() decorator to:
        1. Detect when the passed user-defined class is a dataclass. Thankfully, this is trivial. The following one-line tester probably suffices... probably:

          def is_type_dataclass(cls: type) -> bool:
              '''
              :data:`True` only if the passed class is a **dataclass**
              (i.e., decorated by the standard :obj:`dataclasses.dataclass` decorator).
              '''
          
              return isinstance(cls, type) and hasattr(cls, '__dataclass_fields__')
        2. When the passed user-defined class is a dataclass:

          1. Detect when this dataclass is frozen. According to this StackOverflow thread, it looks like the canonical solution is a similar one-line tester resembling:

            def is_type_dataclass_frozen(cls: type) -> bool:
                '''
                :data:`True` only if the passed class is a **frozen dataclass**
                (i.e., decorated by the standard :obj:`dataclasses.dataclass` decorator
                passed the optional `frozen=True` parameter).
                '''
            
                return is_type_dataclass(cls) and cls__dataclass_params__.frozen  # <-- lol
          2. When this dataclass is frozen, silently reduce to a noop. Run, bear! Run!

          3. Else, detect when this dataclass already defines a __setattr__() dunder method (e.g., with a trivial hasattr(cls, '__setattr__') call). If so, again silently reduce to a noop. Run even harder, bear! Run!

          4. Else, actually do something useful by monkey-patching the __setattr__() dunder method defined above into this class with a simple call to our existing @beartype-specific set_type_attr() utility function (e.g., set_type_attr(cls, '__setattr__', _our_super_cool_setattr_method)).

    3. Rejoice!

@GithubCamouflaged
Copy link
Author

...heh. You're too nice. You also don't want to go here. Deep down, you know the @beartype codebase to be an exhausting pit of doom you won't soon recover from. Nobody deep-dives into @beartype more than once. Most, not even once. There are reasons for this.

Well, every once in a while you feel like a trip to Mordor is in order (whatever you have to tell yourself to justify it). I have a sneaky suspicion I'll have to scratch that itch in the near future, so thank you for the work breakdown structure!

As you've already defined a beartype.BeartypeStrategy for it, I recon beartype.beartype (and perhaps beartype.BeartypeConf) would have to be made aware of the new state of affairs as well? I thought I'd address that first somewhere in the coming weeks.

@leycec
Copy link
Member

leycec commented Jun 7, 2024

...heh. You're much too kind and generous for a trip to Mordor. Oh, very well! If you'd like to tour the hideous sights, I much recommend the beartype._conf.confcls.BeartypeConf.__new__() method as your first (and probably final) destination. Ideally, we'd like to add a new optional is_check_pep557_assign parameter resembling:

class BeartypeConf(object):
    ...
    def __new__(
        ...
        is_check_pep557_assign: bool = False,
        ...
    ) -> 'BeartypeConf':

Basically, just grep for the existing optional is_debug parameter in that submodule and copy-and-paste similar logic for this new is_check_pep557_assign parameter. And... I'm spent. "Sleep now," said the type-checking raven. 😴

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants