Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[arrow-row] include offsets at the Row level so that consumers can slice the data buffer by 'column' #4847

Closed
judahrand opened this issue Sep 22, 2023 · 0 comments
Labels
development-process Related to development process of arrow-rs enhancement Any new improvement worthy of a entry in the changelog

Comments

@judahrand
Copy link
Contributor

judahrand commented Sep 22, 2023

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

I've been looking at the arrow-row format for use in a machine learning 'Feature Store' application. I'm trying to find the most efficient way to move between Arrow format and a key-value store which stores values on a 'cell' basis. I am aware that the arrow-row format is not designed for this sort of use-case and is in fact not ideal since it fiddles with some of the data to make sure it lexicographically sorts nicely and is also not guaranteed to have stable encodings from release to release. Having said all of this it struck me as the fastest way to begin to investigate this idea.

Having said that I thought that I'd open an issue to see if this functionality might have a wider use case and if a PR to introduce it might be accepted?

Describe the solution you'd like

I would like to be able to access the data for an encoded column from each row.

Describe alternatives you've considered

For my use-case it may make sense to fork arrow-row and create a crate which does minimal encoding + guarantees a stable encoding.

Additional context

@judahrand judahrand added the enhancement Any new improvement worthy of a entry in the changelog label Sep 22, 2023
@tustvold tustvold closed this as not planned Won't fix, can't repro, duplicate, stale Jan 1, 2024
@tustvold tustvold added the development-process Related to development process of arrow-rs label Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development-process Related to development process of arrow-rs enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants