Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PEP 658/714 metadata files incorrectly probe encoding #9048

Closed
thatch opened this issue Feb 27, 2024 · 4 comments · Fixed by #9049
Closed

PEP 658/714 metadata files incorrectly probe encoding #9048

thatch opened this issue Feb 27, 2024 · 4 comments · Fixed by #9049
Labels
area/peps Related to PEP support/compliance area/solver Related to the dependency resolver kind/bug Something isn't working as expected

Comments

@thatch
Copy link
Contributor

thatch commented Feb 27, 2024

Description

https://packaging.python.org/en/latest/specifications/core-metadata/ says that metadata is always utf-8. When computing the metadata hash, poetry can use chardet with utf-8 disabled (I don't know why) which considers it as some 8-bit encoding.

metadata_hash = getattr(hashlib, hash_name)(
response.text.encode()
).hexdigest()

This should just hash the bytes directly.

Workarounds

It falls back to loading wheels instead of .metadata, so this is just annoying in logs and a slight performance hit.

Poetry Installation Method

pipx

Operating System

OS X

Poetry Version

Poetry (version 1.8.1)

Poetry Configuration

n/a

Python Sysconfig

n/a

Example pyproject.toml

No response

Poetry Runtime Logs

[urllib3:urllib3.connectionpool] https://<redacted>:443 "GET /packages/10096/uvicorn-0.22.0-py3-none-any.whl.metadata HTTP/1.1" 200 6263
[filelock:filelock] Attempting to acquire lock 4506771856 on /Users/timhatch/Library/Caches/pypoetry/cache/repositories/<redacted>/_http/e/3/5/b/0/e35b011b0ef41d4c24f3769c87a2bb906339710c93d61593cd0496dd.lock
[filelock:filelock] Lock 4506771856 acquired on /Users/timhatch/Library/Caches/pypoetry/cache/repositories/<redacted>/_http/e/3/5/b/0/e35b011b0ef41d4c24f3769c87a2bb906339710c93d61593cd0496dd.lock
[filelock:filelock] Attempting to release lock 4506771856 on /Users/timhatch/Library/Caches/pypoetry/cache/repositories/<redacted>/_http/e/3/5/b/0/e35b011b0ef41d4c24f3769c87a2bb906339710c93d61593cd0496dd.lock
[filelock:filelock] Lock 4506771856 released on /Users/timhatch/Library/Caches/pypoetry/cache/repositories/<redacted>/_http/e/3/5/b/0/e35b011b0ef41d4c24f3769c87a2bb906339710c93d61593cd0496dd.lock
[chardet:chardet.charsetprober] SHIFT_JIS Japanese prober hit error at byte 6044
[chardet:chardet.charsetprober] EUC-JP Japanese prober hit error at byte 4457
[chardet:chardet.charsetprober] EUC-KR Korean prober hit error at byte 4457
[chardet:chardet.charsetprober] CP949 Korean prober hit error at byte 4457
[chardet:chardet.charsetprober] Big5 Chinese prober hit error at byte 4458
[chardet:chardet.charsetprober] EUC-TW Taiwan prober hit error at byte 4457
[chardet:chardet.charsetprober] Johab Korean prober hit error at byte 4457
[chardet:chardet.charsetprober] utf-8 not active
[chardet:chardet.charsetprober] SHIFT_JIS not active
[chardet:chardet.charsetprober] EUC-JP not active
[chardet:chardet.charsetprober] GB2312 Chinese confidence = 0.01
[chardet:chardet.charsetprober] EUC-KR not active
[chardet:chardet.charsetprober] CP949 not active
[chardet:chardet.charsetprober] Big5 not active
[chardet:chardet.charsetprober] EUC-TW not active
[chardet:chardet.charsetprober] Johab not active
[chardet:chardet.charsetprober] windows-1251 Russian confidence = 0.01
[chardet:chardet.charsetprober] KOI8-R Russian confidence = 0.01
[chardet:chardet.charsetprober] ISO-8859-5 Russian confidence = 0.01
[chardet:chardet.charsetprober] MacCyrillic Russian confidence = 0.0
[chardet:chardet.charsetprober] IBM866 Russian confidence = 0.0
[chardet:chardet.charsetprober] IBM855 Russian confidence = 0.01
[chardet:chardet.charsetprober] ISO-8859-7 Greek confidence = 0.0
[chardet:chardet.charsetprober] windows-1253 Greek confidence = 0.0
[chardet:chardet.charsetprober] ISO-8859-5 Bulgarian confidence = 0.01
[chardet:chardet.charsetprober] windows-1251 Bulgarian confidence = 0.01
[chardet:chardet.charsetprober] TIS-620 Thai confidence = 0.01
[chardet:chardet.charsetprober] ISO-8859-9 Turkish confidence = 0.5362886107447785
[chardet:chardet.charsetprober] windows-1255 Hebrew confidence = 0.0
[chardet:chardet.charsetprober] windows-1255 Hebrew confidence = 0.06253883270643365
[chardet:chardet.charsetprober] windows-1255 Hebrew confidence = 0.031269416353216825
Source (<redacted>): Metadata file hash (2bb01c9eb03dbd9c7ce84fbefd798cb73c9b1df39d1d9d08ac166622b963954e) does not match expected hash (0cdbc425e69ca8aacab788fed9479a0bdb7a6f0ca16529e9903146e99173ed85). Metadata file for uvicorn-0.22.0-py3-none-any.whl will be ignored.
@thatch thatch added kind/bug Something isn't working as expected status/triage This issue needs to be triaged labels Feb 27, 2024
@dimbleby
Copy link
Contributor

ie the suggestion is

@@ -165,7 +165,7 @@ class HTTPRepository(CachedRepository):
                     )
                 ):
                     metadata_hash = getattr(hashlib, hash_name)(
-                        response.text.encode()
+                        response.content
                     ).hexdigest()
                     if metadata_hash != link.metadata_hashes[hash_name]:
                         self._log(

?

sounds plausible

@thatch thatch mentioned this issue Feb 27, 2024
2 tasks
@thatch
Copy link
Contributor Author

thatch commented Feb 27, 2024

I fixed almost the same bug last week in pypi-simple: jwodder/pypi-simple#22

Yes that's the fix; I spent 10x on trying to figure out how to test.

@dimbleby
Copy link
Contributor

I am not so sure that what you are doing in #9049 is a good test. As you say, the spec insists that the metadata is utf-8, so making some metadata be not utf-8 is taking us further from reality.

Myself, I would have been tempted just to submit the fix and see if I could get away with that...

@radoering radoering added area/solver Related to the dependency resolver area/peps Related to PEP support/compliance and removed status/triage This issue needs to be triaged labels Feb 28, 2024
Copy link

github-actions bot commented Apr 2, 2024

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Apr 2, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area/peps Related to PEP support/compliance area/solver Related to the dependency resolver kind/bug Something isn't working as expected
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants