-
Notifications
You must be signed in to change notification settings - Fork 29.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lib: ensure TextDecoder
only removes utf8
BOM on utf8
encoding
#42779
Conversation
3cc2e04
to
d84a4eb
Compare
I'm guessing |
d84a4eb
to
96f2c21
Compare
Updated! I also attempted to add a test, let me know if it is wrong. |
TextDecoder
only removes utf-8
BOM on utf-8
encodingTextDecoder
only removes utf8
BOM on utf8
encoding
This comment was marked as resolved.
This comment was marked as resolved.
96f2c21
to
4b0cb91
Compare
Looks right to me. Anyone more encoding-knowledgable want to review? |
@Trott can you help me with these failures so I don't need to approve the Jenkins CI to my entire GH account? |
|
🤦 I totally looked at those conversions too. Thanks! |
@phated It appears that the change here causes a failure in
|
@Trott thanks. It looks like a bug in my test specifically. |
const dec = new TextDecoder(i); | ||
assert.strictEqual(dec.encoding, 'utf-16le'); | ||
const res = dec.decode(buf); | ||
assert.strictEqual(res, '믯璿攀猀琀가'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I tried decoding this on current node 16, the bug shows itself in an invalid character at the end: '믯璿攀猀琀가�'
If my last commit doesn't work. I might need to copy the gulp test case into here (even though it is a larger buffer) |
|
I don't know how to test this. In the gulp tests, we save files with different encodings and I have no idea how to craft those buffers inside a Buffer. |
Did you mean to close this? I'd like to keep chipping away at it unless you're opening a new PR or something? |
You can create a file and save it in the test temp directory. (I can write that code or point you to a sample, so you don't have to learn that API unless you think you're going to be writing more Node.js tests.) We could also put a file permanently in the test/fixtures directory. |
Yeah, I meant to close it. At this point, I can't determine if I'm crazy or not because reading the file into the repl seems to give me the expected results, but my tests fail. The furthest I got before giving up was possibly a re-encoding issue (back into a buffer). One thing that I noticed is that |
While working on gulpjs/remove-bom-stream#8, I noticed that our test case to ensure a UTF-8 BOM at the beginning of a UTF-16 file wouldn't be removed; however, it was.
I dug into the PR at #30132 and noticed that it lost the utf-8 and utf-16-le checks in the refactor.
This should limit the BOM removal to just the UTF-8 encoding. There should also be a follow-up PR that adds the utf-16-le BOM removal back to the code.