Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change default encoding to utf-8 in normalize_header_key and normalize_header_value functions #3238

Open
ZM25XC opened this issue Jul 11, 2024 · 2 comments

Comments

@ZM25XC
Copy link

ZM25XC commented Jul 11, 2024

Description

I have encountered decoding errors with some requests that use ASCII encoding. Changing the default encoding to UTF-8 resolves these errors. I propose updating the normalize_header_key and normalize_header_value functions in _utils.py to use UTF-8 as the default encoding.

Steps to Reproduce

  1. Call normalize_header_key or normalize_header_value with a non-ASCII string and no encoding specified.
  2. Observe the decoding failure with ASCII encoding.
  3. Change the encoding to UTF-8 and observe that the error is resolved.

Example Code

header_key_unicode = "内容类型"
normalized_key_unicode = normalize_header_key(header_key_unicode, lower=True)
# This raises a UnicodeEncodeError with ASCII encoding.

normalized_key_unicode_utf8 = normalize_header_key(header_key_unicode, lower=True, encoding="utf-8")
print(normalized_key_unicode_utf8)  # Works correctly with UTF-8 encoding.

Proposed Solution

Modify the _utils.py file to use UTF-8 as the default encoding:

def normalize_header_key(
    value: str | bytes,
    lower: bool,
    encoding: str | None = None,
) -> bytes:
    """
    Coerce str/bytes into a strictly byte-wise HTTP header key.
    """
    if isinstance(value, bytes):
        bytes_value = value
    else:
        bytes_value = value.encode(encoding or "utf-8")

    return bytes_value.lower() if lower else bytes_value

def normalize_header_value(value: str | bytes, encoding: str | None = None) -> bytes:
    """
    Coerce str/bytes into a strictly byte-wise HTTP header value.
    """
    if isinstance(value, bytes):
        return value
    return value.encode(encoding or "utf-8")

Rationale

Using UTF-8 as the default encoding ensures that the functions can handle a wider range of input values without raising an error. UTF-8 encoding is capable of encoding a larger set of characters compared to ASCII.

@iamjatinyadav
Copy link

hey guys, can I work on this?

@ZM25XC
Copy link
Author

ZM25XC commented Jul 14, 2024

hey guys, can I work on this?
Thank you for your interest in this issue! I am currently working on a Pull Request to address this problem. If you have any suggestions or would like to review the changes, your input would be greatly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants