-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TextEncoder.encodeInto
may produce wrong result for one-byte non-ASCII characters
#18255
Comments
I was looking into it a bit and it seems the problem comes from fast version of encode function Line 371 in 3487fde
SeqOneByteString optimisation. If I look at the value of input.as_bytes() inside this function for provide example it equals to [200, 0] instead of [195, 136]
Not entirely sure why it happens though, any ideas? @littledivy |
Small correction to @andrewnester:
The problem is that diff --git a/src/fast_api.rs b/src/fast_api.rs
index e4ac272..8c64c1a 100644
--- a/src/fast_api.rs
+++ b/src/fast_api.rs
@@ -221,10 +221,10 @@ impl FastApiOneByteString {
pub fn as_str(&self) -> &str {
// SAFETY: The string is guaranteed to be valid UTF-8.
unsafe {
- std::str::from_utf8_unchecked(std::slice::from_raw_parts(
+ std::str::from_utf8(std::slice::from_raw_parts(
self.data,
self.length as usize,
- ))
+ )).unwrap()
}
}
} |
TextEncoder.encodeInto
may produce wrong result for one-byte non-ASCII characters (code point range 128 - 255).To reproduce the bug:
Error:
The issue was introduced in
v1.31.2
(I have testedv1.31.1
-v1.31.3
).I might be wrong, but it seems to be PR #17996, which assumes:
Maybe
input
is not always UTF-8?cc @littledivy
The text was updated successfully, but these errors were encountered: