Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pkg_resources: Merge @overload and TypeVar annotations from typeshed #4390

Merged
merged 5 commits into from
Jun 24, 2024

Conversation

Avasam
Copy link
Contributor

@Avasam Avasam commented May 23, 2024

Summary of changes

Step 3.2 of #2345 (comment)

Once this #4355 and #4391 are merged, pkg_setuptools will be on par with Typeshed

Pull Request Checklist

  • Changes have tests (type-checking)
  • News fragment added in newsfragments/. (no user facing changes yet until pkg_resources is considered typed)
    (See documentation for details)

Copy link
Contributor

@abravalheri abravalheri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for working on this @Avasam.

@Avasam Avasam force-pushed the typeshed-overload-and-typevar branch from 394194a to 3279406 Compare June 17, 2024 17:36
# Any object works, but let's indicate we expect something like a module (optionally has __loader__ or __file__)
_ModuleLike = Union[object, types.ModuleType]
_ModuleLike = Any
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This annotation looks worse than the previous one, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah but you get variance issues where ZipProvider.__init__ doesn't match _ProviderFactoryType

pkg_resources\__init__.py:2142: error: Argument 2 to "register_loader_type" has incompatible type "Type[ZipProvider]"; expected "Callable[[Union[object, Module]], IResourceProvider]"  [arg-type]

It was already meant as a "you can pass in anything because there's no way to check this with Python's current type system" anyway when used as a parameter type (since you can't type-check for optional members).

The only other option would be to have

_ModuleLike = Union[object, types.ModuleType]
# Any: Should be _ModuleLike but we end up with variance issues
_ProviderFactoryType = Callable[[Any], "IResourceProvider"]

Copy link
Contributor

@abravalheri abravalheri Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ummm 🤔.

I would understand if ZipProvider was a subclass of IResourceProvider... however IResourceProvider is a protocol. That should eliminate the problem with the variance of the result type, no?

Then both ZipProvider.__init__ and Callable[[_ModuleLike], "IResourceProvider"] explicitly accept _ModuleLike as argument type.

Copy link
Contributor Author

@Avasam Avasam Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pyright gives us a more detailed error here:

Argument of type "type[ZipProvider]" cannot be assigned to parameter "provider_factory" of type "_ProviderFactoryType" in function "register_loader_type"
  Type "type[ZipProvider]" is incompatible with type "_ProviderFactoryType"
    Parameter 1: type "_ModuleLike" is incompatible with type "_ZipLoaderModule"
      Type "_ModuleLike" is incompatible with type "_ZipLoaderModule"
        "object" is incompatible with protocol "_ZipLoaderModule"
          "__loader__" is not present [reportArgumentType](https://github.com/microsoft/pyright/blob/main/docs/configuration.md#reportArgumentType)

So the following might indeed be more appropriate:

_ModuleLike = Union[object, types.ModuleType]
# Any: Should be _ModuleLike but we end up with issues where _ModuleLike doesn't have _ZipLoaderModule's `__loader__`
_ProviderFactoryType = Callable[[Any], "IResourceProvider"]

(either way the first param of _ProviderFactoryType functionally becomes Any anyway)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would also affect the _provider_factories variable right?

Would it make sense to add _ZipLoaderModule in the _ModuleType union?

Copy link
Contributor Author

@Avasam Avasam Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to add _ZipLoaderModule in the _ModuleType union?

If you do that, you force all module-like params to statically have a __loader__ attribute. When the point was that it's an optional attribute (ie: getattr(module, '__loader__', None))

@@ -1582,6 +1703,8 @@ def run_script(self, script_name: str, namespace: dict[str, Any]):
**locals()
),
)
if not self.egg_info:
raise TypeError("Provider is missing egg_info", self.egg_info)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This if never runs, right?

The reasoning is that when self.egg_info is falsey, then self.has_metadata(script) is also falsey, so the first exception always run and the if is never reached...

Copy link
Contributor Author

@Avasam Avasam Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check could be hit if a subclass overwrote has_metadata in a way that doesn't garrantee self.egg_info is truthy or that has_metadata called by run_script will always fail (like FileMetadata could hit it if it didn't have the name == 'PKG-INFO' check since the base run_script forces name to start with scripts/.

There's just no way to statically communicate that has_metadata checks for self.egg_info (it could be a type guard if egg_info was a parameter of has_metadata).

Funnily enough, the Metadata test class in pkg_resources/tests/test_resources.py shows exactly how this can be hit:

>>> from pkg_resources.tests.test_resources import Metadata
>>> Metadata(["scripts/",""]).run_script("", {})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "setuptools\pkg_resources\__init__.py", line 1708, in run_script
    script_filename = self._fn(self.egg_info, script)
  File "setuptools\pkg_resources\__init__.py", line 1744, in _fn
    return os.path.join(base, *resource_name.split('/'))
  File "C:\Program Files\Python39\lib\ntpath.py", line 78, in join
    path = os.fspath(path)
TypeError: expected str, bytes or os.PathLike object, not NoneType

Which becomes:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "setuptools\pkg_resources\__init__.py", line 1707, in run_script
    raise TypeError("Provider is missing egg_info", self.egg_info)
TypeError: ('Provider is missing egg_info', None)

I can alternatively do if not self.has_metadata(script) or not self.egg_info:, but I went with a TypeError to keep the exact same behaviour just in case someone downstream did override has_metadata in a way that triggers this error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I see, the problem is related to the way the implementation is forcing os.path.join(self.egg_info, ...) because in theory if has_metadata/get_metadata are implemented by a subclass in a way that does not depend egg_info, that would be fine...

I suppose that if the subclass overwrites _fn so that os.path.join is not called, then it would also be OK.

Copy link
Contributor Author

@Avasam Avasam Jun 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also brings the question, could egg_info be initialized to an empty string instead of None ? All of setuptool's code checks for Truthynes of it, never for an explicit None. It's also an issue that will come up again as much code that currently isn't type-checked by mypy (functions are missing annotations) or my pyright branch (error currently disabled to pass as-is initially) makes the assumption that egg_info is not None w/o checking.

That would count as a breaking change though (if someone did x.egg_info is not None). So that can be a conversation for later. The None egg_info violations by pyright could also just be due to the currently inferred type caused by initialize_options

Copy link
Contributor

@abravalheri abravalheri Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm... difficult.
I feel that we should not add this exception in this part of the code.

If the major concern is that inheritance can be used so that self.egg_info is None. Then inheritance also can be used to change def _fn to handle the None case right? (and all the other methods that rely on self.egg_info).

Adding this hard check here would then be a backwards incompatible change...


If inheritance is out of the table, then self.egg_info is ensured to not be None there.

This also brings the question, could egg_info be initialized to an empty string instead of None ?

Haven't tested that. Python functions treat the "" directory as CWD right? So maybe it would work... but I don't know.

Copy link
Contributor Author

@Avasam Avasam Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last commit applies your suggestion. (it's like somewhere in between a TypeError and a NotImplementedError (for a specific type) )

We could consider coercing None to "" in _fn since that seems to follow the logic where most code checks for an empty string first, and run_script was already letting and empty string pass. But is technically a change in behaviour.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively if this is the only point of contention left for this PR, I can undo the parameter change for _fn since it's not public and will get the checker passing. Splitting off the conversation about what to do with a none egg_info in run_script or None base in _fn

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

I think we can leave the implementation as per your last commit indefinitely.

raise NotImplementedError(
"Can't perform this operation for unregistered loader type"
)

def _fn(self, base, resource_name: str):
def _fn(self, base: str, resource_name: str):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change to base: str adds extra limitations that were not there before (in terms of inheritance and Liskov substitution).

Right now this code behaves as base: str | None if I understand correctly, and intentionally "let it crash" (on os.path.join) when base is indeed None.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abravalheri abravalheri merged commit 48ce5ed into pypa:main Jun 24, 2024
20 of 22 checks passed
@Avasam Avasam deleted the typeshed-overload-and-typevar branch June 24, 2024 16:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants