Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RangeError: Invalid string length with large files. #35973

Closed
denniscm opened this issue Nov 4, 2020 · 6 comments
Closed

RangeError: Invalid string length with large files. #35973

denniscm opened this issue Nov 4, 2020 · 6 comments
Labels
v8 engine Issues and PRs related to the V8 dependency.

Comments

@denniscm
Copy link

denniscm commented Nov 4, 2020

  • Version: 14.15.0
  • Platform: Windows 10 64-bit
  • Subsystem:

What steps will reproduce the bug?

When i'm trying to read a large Json File (700MB - 26.640.222 lines) using ReadStream i'm getting the error: "RangeError: Invalid string length". Will post the code i'm using to read the file:

const { createReadStream } = require('fs');

const input = createReadStream(`${__dirname}/input/test.json`, { encoding: 'utf8' })
        .on('error', (error) => {
            console.log('Error to read file, check if the filename is correct!');
            console.log(error);
        });

    let buildedChunkData = '';

    input.on('data', data => {
        buildedChunkData += data;
    }).on('end', () => {
            console.log(JSON.parse(buildedChunkData.toString()));
    });

How often does it reproduce? Is there a required condition?

Only when the file to be read is very large.

What is the expected behavior?

Not get the error when reading a large file.

What do you see instead?

This is the error that shows to me:

buildedChunkData += data;
                            ^
RangeError: Invalid string length
    at ReadStream.<anonymous> (C:\Gitlab\Outros\JsonToCsv\event_vehicle_All_editor.js:104:29)
    at ReadStream.emit (events.js:315:20)
    at addChunk (_stream_readable.js:309:12)
    at readableAddChunk (_stream_readable.js:280:11)
    at ReadStream.Readable.push (_stream_readable.js:223:10)
    at internal/fs/streams.js:226:14
    at FSReqCallback.wrapper [as oncomplete] (fs.js:539:5)

Additional information

The weird thing is, i can read this large file with Node 12.19.0 and Node 13.X versions without problems, only in this new versions 14.X that i'm getting this issue. I'm doing something wrong or there is some workaround this? Because it works nice with almost all the files, only in large files that i'm having this issue.

Thank you.

@mmomtchev
Copy link
Contributor

mmomtchev commented Nov 5, 2020

This is a V8 limitation. I confirm that the maximum number of UTF16 codepoints in a string on 64-bit platforms has passed from
-((-1) << 30 + 1) = 2147483648 = 2048M
to
((1 << 29) - 24) = 536870888 = 512M (which is the current browser limit in Chrome too)
@targos

@targos
Copy link
Member

targos commented Nov 5, 2020

Yes, the limit is defined here:

node/deps/v8/include/v8.h

Lines 3004 to 3007 in eb24573

class V8_EXPORT String : public Name {
public:
static constexpr int kMaxLength =
internal::kApiSystemPointerSize == 4 ? (1 << 28) - 16 : (1 << 29) - 24;

I don't know why it has this value nor whether this could be increased in the future.
/cc @nodejs/v8

@targos targos added the v8 engine Issues and PRs related to the V8 dependency. label Nov 5, 2020
@victorgomes
Copy link

The limitation is due to pointer compressions, please see: https://chromium-review.googlesource.com/c/v8/v8/+/2030916

aladdin-add referenced this issue in kataw/kataw Mar 15, 2021
@aladdin-add Can you update this tests? The test runner wen crazy again, so got the same issues on update and some other failures

RangeError: Invalid string length
    at JSON.stringify (<anonymous>)
    at outputBlock (D:\kataw15\test\runner\tob.ts:112:8)
    at Object.updateTob (D:\kataw15\test\runner\tob.ts:23:18)
    at D:\kataw15\test\cli.ts:32:9
    at Generator.next (<anonymous>)
    at fulfilled (D:\kataw15\node_modules\tslib\tslib.js:114:62)
@paulsmithkc
Copy link

Consider processing the JSON chunk, by chunk as it comes in, rather than trying to load it all into a string and then parsing it.

This will drastically lower the amount of memory needed, and speed up the overall file load time.

@LooOOooM
Copy link

LooOOooM commented Nov 10, 2021

I am also using v8 and got the same error within the code below at:

buf += d.toString(); // when data is read, stash it in a string buffer

However, I wounder why this even happens when I cut the data off below 2GB like:

-rw-rw-r--. 1 hax hax 1,8G 10. Nov 09:41 data/../../cache/Jan2013-Nov2021-n-trainingdata_separated-max_cutted.csv.json

Can you hope me please to overcome this problem, hinting me in of an example ?

Thanks!!!

const StreamArray = require('stream-json/streamers/StreamArray');
const path = require('path');
const fs = require('fs');
function get_json(filePath)
{
        return new Promise((resolve,reject)=>{
                var stream = fs.createReadStream(filePath, {flags: 'r', encoding: 'utf-8'});
                var buf = '';

                stream.on('data', function(d) {
                    buf += d.toString(); // when data is read, stash it in a string buffer
                    //pump(); // then process the buffer
                });

                stream.on('end',function(){
                        buf = buf.split('\n').join('');
                        buf = buf.split('\r').join('');
                        resolve(JSON.parse(buf));
                })
        })
}


module.exports = get_json;

@targos
Copy link
Member

targos commented Nov 20, 2021

Closing as the maximum string length is determined by V8.

@targos targos closed this as completed Nov 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
v8 engine Issues and PRs related to the V8 dependency.
Projects
None yet
Development

No branches or pull requests

6 participants