[arrow-row] include offsets
at the Row
level so that consumers can slice the data buffer by 'column'
#4847
Labels
development-process
Related to development process of arrow-rs
enhancement
Any new improvement worthy of a entry in the changelog
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
I've been looking at the
arrow-row
format for use in a machine learning 'Feature Store' application. I'm trying to find the most efficient way to move between Arrow format and a key-value store which stores values on a 'cell' basis. I am aware that thearrow-row
format is not designed for this sort of use-case and is in fact not ideal since it fiddles with some of the data to make sure it lexicographically sorts nicely and is also not guaranteed to have stable encodings from release to release. Having said all of this it struck me as the fastest way to begin to investigate this idea.Having said that I thought that I'd open an issue to see if this functionality might have a wider use case and if a PR to introduce it might be accepted?
Describe the solution you'd like
I would like to be able to access the data for an encoded column from each row.
Describe alternatives you've considered
For my use-case it may make sense to fork
arrow-row
and create a crate which does minimal encoding + guarantees a stable encoding.Additional context
The text was updated successfully, but these errors were encountered: