Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] A collection of items to improve speed of parquet metadata encoding #5853

Open
1 of 4 tasks
alamb opened this issue Jun 7, 2024 · 0 comments
Open
1 of 4 tasks
Labels
enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate

Comments

@alamb
Copy link
Contributor

alamb commented Jun 7, 2024

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

There have been several recent assertions that Parquet is not suitable for handling wide tables with 1000s of columns

The rationale often goes something like wide tables have “large” metadata, which takes a “long time” to decode, often longer than reading the data itself.

This has led to several proposals for new file formats such as in BtrBlocks, Lance V2. Nimble, and recent discussions on the parquet mailing list.

However, there are several ways we can improve the performance of the existing thrift decoding in parquet-rs and this ticket captures several ideas of how to do so

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Any new improvement worthy of a entry in the changelog parquet Changes to the parquet crate
Projects
None yet
Development

No branches or pull requests

1 participant