Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default encoding to utf-8 in normalize_header_key and normalize_header_value functions #3241

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ZM25XC
Copy link

@ZM25XC ZM25XC commented Jul 14, 2024

Summary

This Pull Request addresses the issue of decoding errors encountered when using ASCII encoding in the normalize_header_key and normalize_header_value functions in _utils.py. By changing the default encoding to UTF-8, we can handle a wider range of input values without raising errors.

Changes

  • Updated the default encoding in normalize_header_key and normalize_header_value functions from "ascii" to "utf-8".

Example Code

def normalize_header_key(
    value: str | bytes,
    lower: bool,
    encoding: str | None = None,
) -> bytes:
    """
    Coerce str/bytes into a strictly byte-wise HTTP header key.
    """
    if isinstance(value, bytes):
        bytes_value = value
    else:
        bytes_value = value.encode(encoding or "utf-8")

    return bytes_value.lower() if lower else bytes_value

def normalize_header_value(value: str | bytes, encoding: str | None = None) -> bytes:
    """
    Coerce str/bytes into a strictly byte-wise HTTP header value.
    """
    if isinstance(value, bytes):
        return value
    return value.encode(encoding or "utf-8")

Rationale

Using UTF-8 as the default encoding ensures that the functions can handle a wider range of input values without raising an error. UTF-8 encoding is capable of encoding a larger set of characters compared to ASCII.

Additional Context

Here is an example that demonstrates the issue and how the proposed change resolves it:

header_key_unicode = "内容类型"
normalized_key_unicode = normalize_header_key(header_key_unicode, lower=True)
# This raises a UnicodeEncodeError with ASCII encoding.

normalized_key_unicode_utf8 = normalize_header_key(header_key_unicode, lower=True, encoding="utf-8")
print(normalized_key_unicode_utf8)  # Works correctly with UTF-8 encoding.

Checklist

  • I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
  • I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
  • I've updated the documentation accordingly.

…lize_header_value` functions

#### Description

This Pull Request addresses the issue of decoding errors when using ASCII encoding in the `normalize_header_key` and `normalize_header_value` functions in `_utils.py`. By changing the default encoding to UTF-8, we can handle a wider range of input values without raising errors.

#### Changes

- Updated the default encoding in `normalize_header_key` and `normalize_header_value` functions from "ascii" to "utf-8".
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant