-
-
Notifications
You must be signed in to change notification settings - Fork 270
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot create a string longer than 0x1fffffe8 characters when using data-persistence in server #554
Comments
Thanks for opening this. @allevo we should rework the persistence plugin if we can reproduce this |
Hi @imertz ! |
I'll try it out and come back to you. |
IIRC dpack worked for persiting the file to disk but if the file is larger than 512mb the Here's a naive implementation with streaming support for Node.js with @msgpack/msgpack (basically the current binary format solution with streaming support): import type { AnyOrama, RawData } from '@orama/orama';
import { create, load, save } from '@orama/orama';
import fs from 'fs';
import { decode, encode } from '@msgpack/msgpack';
export const persistToFile = async (
db: AnyOrama,
outputFile: string,
) => {
const dbExport = await save(db);
const msgpack = encode(dbExport);
const serialized = Buffer.from(
msgpack.buffer,
msgpack.byteOffset,
msgpack.byteLength,
);
const writeStream = fs.createWriteStream(outputFile);
const chunkSize = 1024;
for (let i = 0; i < serialized.length; i += chunkSize) {
const end = Math.min(i + chunkSize, serialized.length);
const chunk = serialized.slice(i, end);
const hexChunk = chunk.toString('hex');
writeStream.write(hexChunk);
}
writeStream.end();
writeStream.on('finish', () => {
console.log('File has been written as', outputFile);
});
};
const deserialize = async (inputFile: string) => {
return new Promise<RawData>((resolve, reject) => {
const readStream = fs.createReadStream(inputFile, {
encoding: 'utf8',
// highWaterMark: 1024,
});
const chunks: Buffer[] = [];
readStream.on('data', (chunk: string) => {
chunks.push(Buffer.from(chunk, 'hex'));
});
readStream.on('end', () => {
const combinedBuffer = Buffer.concat(chunks);
const decodedData = decode(Buffer.from(combinedBuffer));
resolve(decodedData as RawData);
});
readStream.on('error', (err) => {
reject(err);
});
});
};
export const restoreFromFile = async (inputFile: string) => {
const deserialized = await deserialize(inputFile);
const db = await create({
schema: {
__placeholder: 'string',
},
});
await load(db, deserialized);
return db;
}; Disclaimer: I extracted these functions from from larger codebase so I haven't actually ran this exact piece of code but hopefully this helps. Also not sure if the chunking part on |
We also noticed that you can write the msgpack encoded binary directly to file instead of turning it to hex before writing. This makes the msp file half the size. |
Describe the bug
When trying to persist a large amount of data using the
persistToFile
function, Node.js throws an error:Cannot create a string longer than 0x1fffffe8 characters
. This error is due to the V8 engine's limitation on string size.To Reproduce
persistToFile
function.Expected behavior
The data should be successfully persisted to the file without any errors.
Environment Info
Affected areas
Data Insertion
Additional context
Possible Solution:
Consider implementing a streaming approach to write the data to the file, which would avoid having to convert the entire Buffer to a string at once.
The text was updated successfully, but these errors were encountered: