Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specific points index #112

Open
martindurant opened this issue Jan 20, 2023 · 2 comments
Open

Specific points index #112

martindurant opened this issue Jan 20, 2023 · 2 comments

Comments

@martindurant
Copy link

Instead of creating seek points every x bytes, is it theoretically possible to pick specific points in the data stream that we know we will be needing? If yes, is there any appetite for implementing this?

@pauldmccarthy
Copy link
Owner

pauldmccarthy commented Jan 23, 2023

Hi @martindurant, for gzip/deflate streams I don't think it would be possible to create index points at exact locations, as the zran functions calls the zlib inflate function so as to stop inflation at deflate block boundaries, which are somewhat arbitrarily placed in a typical gzip file (although things would be different if you controlled creation of the file).

However, it would probably not be too much work to adjust the code to create index points at approximately the desired locations - the logic controlling index point creation (excluding points at the start/end of the stream) is here. I suppose the set of desired locations could be passed in at creation and checked here, and a new bit flag which controls whether to create points at spacing intervals, or at the specified locations..

@martindurant
Copy link
Author

Understood, thanks. The use case here would be individual files within a tar.gz archive, or contiguous array buffers within any of several file formats that store data that way. Those all would have exact byte start/stop values, of course, which won't align with gzip boundaries - but if you can get the best previous boundary for each location, then seek and read will work pretty well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants