Change default encoding to utf-8 in `normalize_header_key` and `normalize_header_value` functions #3241

ZM25XC · 2024-07-14T13:13:55Z

Summary

This Pull Request addresses the issue of decoding errors encountered when using ASCII encoding in the normalize_header_key and normalize_header_value functions in _utils.py. By changing the default encoding to UTF-8, we can handle a wider range of input values without raising errors.

Changes

Updated the default encoding in normalize_header_key and normalize_header_value functions from "ascii" to "utf-8".

Example Code

def normalize_header_key(
    value: str | bytes,
    lower: bool,
    encoding: str | None = None,
) -> bytes:
    """
    Coerce str/bytes into a strictly byte-wise HTTP header key.
    """
    if isinstance(value, bytes):
        bytes_value = value
    else:
        bytes_value = value.encode(encoding or "utf-8")

    return bytes_value.lower() if lower else bytes_value

def normalize_header_value(value: str | bytes, encoding: str | None = None) -> bytes:
    """
    Coerce str/bytes into a strictly byte-wise HTTP header value.
    """
    if isinstance(value, bytes):
        return value
    return value.encode(encoding or "utf-8")

Rationale

Using UTF-8 as the default encoding ensures that the functions can handle a wider range of input values without raising an error. UTF-8 encoding is capable of encoding a larger set of characters compared to ASCII.

Additional Context

Here is an example that demonstrates the issue and how the proposed change resolves it:

header_key_unicode = "内容类型"
normalized_key_unicode = normalize_header_key(header_key_unicode, lower=True)
# This raises a UnicodeEncodeError with ASCII encoding.

normalized_key_unicode_utf8 = normalize_header_key(header_key_unicode, lower=True, encoding="utf-8")
print(normalized_key_unicode_utf8)  # Works correctly with UTF-8 encoding.

Checklist

I understand that this PR may be closed in case there was no previous discussion. (This doesn't apply to typos!)
I've added a test for each change that was introduced, and I tried as much as possible to make a single atomic change.
I've updated the documentation accordingly.

…lize_header_value` functions #### Description This Pull Request addresses the issue of decoding errors when using ASCII encoding in the `normalize_header_key` and `normalize_header_value` functions in `_utils.py`. By changing the default encoding to UTF-8, we can handle a wider range of input values without raising errors. #### Changes - Updated the default encoding in `normalize_header_key` and `normalize_header_value` functions from "ascii" to "utf-8".

tomchristie mentioned this pull request Jul 23, 2024

Change default encoding to utf-8 in normalize_header_key and normalize_header_value functions #3238

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change default encoding to utf-8 in `normalize_header_key` and `normalize_header_value` functions #3241

Change default encoding to utf-8 in `normalize_header_key` and `normalize_header_value` functions #3241

ZM25XC commented Jul 14, 2024

Change default encoding to utf-8 in normalize_header_key and normalize_header_value functions #3241

Are you sure you want to change the base?

Change default encoding to utf-8 in normalize_header_key and normalize_header_value functions #3241

Conversation

ZM25XC commented Jul 14, 2024

Summary

Changes

Example Code

Rationale

Additional Context

Checklist

Change default encoding to utf-8 in `normalize_header_key` and `normalize_header_value` functions #3241

Change default encoding to utf-8 in `normalize_header_key` and `normalize_header_value` functions #3241