-
Notifications
You must be signed in to change notification settings - Fork 28.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
buffer: validate UTF8 on fast path #54526
base: main
Are you sure you want to change the base?
Conversation
a930ad4
to
0fda693
Compare
23f1b23
to
8be5c43
Compare
test/parallel/test-buffer-write.js
Outdated
{ | ||
let i = 0; | ||
|
||
while (i < 1_000_000) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a non-deterministic test is unpleasant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can trigger fast api call using the method @targos implemented in his previous pull requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks to me. I wish we could use a deterministic test. Having to loop a million times to trigger and issue is both slow and error prone.
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #54526 +/- ##
==========================================
+ Coverage 87.31% 87.61% +0.29%
==========================================
Files 649 650 +1
Lines 182755 182892 +137
Branches 35051 35390 +339
==========================================
+ Hits 159575 160236 +661
+ Misses 16452 15925 -527
- Partials 6728 6731 +3
|
This comment was marked as outdated.
This comment was marked as outdated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some lint issues but otherwise LGTM
This comment was marked as outdated.
This comment was marked as outdated.
With the current commits, the following test will fail: let i = 0;
const testStr = "\xc2\x80"; // when stored in 1-byte chars, this is a valid UTF-8 encoding
const expected = Buffer.from(testStr).toString("hex");
for(; i < 1_000_000; i++) {
const buf = Buffer.from(testStr);
const ashex = buf.toString("hex");
if (ashex !== expected) {
console.log(`Decoding changed in iteration ${i} when changing to FastWriteStringUTF8, got ${ashex}, expected ${expected}`);
break;
}
}
if(i<1_000_000) {
console.error("FAILED after %d iterations",i);
} else
console.log("PASSED after %d iterations",i); The source string is stored as UTF16 (compressed to 1 byte), but still needs to be converted to UTF-8. The fast path can only be implemented as memcpy when only characters from the range 0-127 are present. |
I thought |
< when only characters from the range 0-127 are present @lemira does simdutf have this? |
@ShenHongFei can you try again with your project. I think I might have found the issue. |
For future reference: Simple fuzzing: const assert = require('node:assert')
const crypto = require('node:crypto')
for (let i = 0; i < 1e6; i++) {
const len = Math.floor(Math.random() * 128)
const buf = Buffer.allocUnsafe(len)
crypto.getRandomValues(buf)
for (let n = 0; n < 16; n++) {
const start = Math.floor(Math.random() * len)
const end = Math.min(len, start + Math.floor(Math.random() * (len - start)))
const src = buf.subarray(start, end)
const str = src.toString('utf8')
let bufNew = Buffer.alloc(buf.length);
let bufOld = Buffer.alloc(buf.length);
bufNew = bufNew.subarray(0, bufNew.utf8Write(str, start))
bufOld = bufOld.subarray(0, bufOld.utf8WriteLegacy(str, start)) // old implementation
assert.deepStrictEqual(bufNew, bufOld)
}
} |
Adds a "safe" version of convert_latin1_to_utf8 with a maximum output length. Refs: nodejs/node#54526
Adds a "safe" version of convert_latin1_to_utf8 with a maximum output length. Refs: nodejs/node#54526
Adds a "safe" version of convert_latin1_to_utf8 with a maximum output length. Refs: nodejs/node#54526
@ronag I tried it, and the problem has been fixed. 👍 |
Commit Queue failed- Loading data for nodejs/node/pull/54526 ✔ Done loading data for nodejs/node/pull/54526 ----------------------------------- PR info ------------------------------------ Title buffer: validate UTF8 on fast path (#54526) Author Robert Nagy <[email protected]> (@ronag) Branch ronag:fast-utf8-2 -> nodejs:main Labels buffer, c++, author ready, needs-ci Commits 1 - buffer: validate UTF8 on fast path Committers 1 - Robert Nagy <[email protected]> PR-URL: https://github.com/nodejs/node/pull/54526 Fixes: https://github.com/nodejs/node/issues/54521 Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: Benjamin Gruenbaum <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Paolo Insogna <[email protected]> ------------------------------ Generated metadata ------------------------------ PR-URL: https://github.com/nodejs/node/pull/54526 Fixes: https://github.com/nodejs/node/issues/54521 Reviewed-By: Daniel Lemire <[email protected]> Reviewed-By: Benjamin Gruenbaum <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Anna Henningsen <[email protected]> Reviewed-By: Yagiz Nizipli <[email protected]> Reviewed-By: Paolo Insogna <[email protected]> -------------------------------------------------------------------------------- ℹ This PR was created on Fri, 23 Aug 2024 14:34:31 GMT ✔ Approvals: 6 ✔ - Daniel Lemire (@lemire): https://github.com/nodejs/node/pull/54526#pullrequestreview-2257470621 ✔ - Benjamin Gruenbaum (@benjamingr) (TSC): https://github.com/nodejs/node/pull/54526#pullrequestreview-2273156047 ✔ - James M Snell (@jasnell) (TSC): https://github.com/nodejs/node/pull/54526#pullrequestreview-2258976699 ✔ - Anna Henningsen (@addaleax): https://github.com/nodejs/node/pull/54526#pullrequestreview-2258958504 ✔ - Yagiz Nizipli (@anonrig) (TSC): https://github.com/nodejs/node/pull/54526#pullrequestreview-2272343097 ✔ - Paolo Insogna (@ShogunPanda) (TSC): https://github.com/nodejs/node/pull/54526#pullrequestreview-2260113366 ✘ Last GitHub CI failed ℹ Last Full PR CI on 2024-08-30T13:34:03Z: https://ci.nodejs.org/job/node-test-pull-request/61707/ ℹ Last Benchmark CI on 2024-08-30T10:28:48Z: https://ci.nodejs.org/view/Node.js%20benchmark/job/benchmark-node-micro-benchmarks/1626/ - Querying data for job/node-test-pull-request/61707/ ✔ Last Jenkins CI successful -------------------------------------------------------------------------------- ✔ Aborted `git node land` session in /home/runner/work/node/node/.ncuhttps://github.com/nodejs/node/actions/runs/10637813307 |
Adds a "safe" version of convert_latin1_to_utf8 with a maximum output length. Refs: nodejs/node#54526
Adds a "safe" version of convert_latin1_to_utf8 with a maximum output length. Refs: nodejs/node#54526
Adds a "safe" version of convert_latin1_to_utf8 with a maximum output length. Refs: nodejs/node#54526
Adds a "safe" version of convert_latin1_to_utf8 with a maximum output length. Refs: nodejs/node#54526
Fast API handles invalid UTF differently than the slow API. Fixes: nodejs#54521 PR-URL: nodejs#54525
Fast API handles invalid UTF differently than the slow API.
Fixes: #54521
PR-URL: #54525