-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding support for Polars structs #10943
Conversation
Oh, this looks interesting. I'm excited to see where it goes. |
Would you mind adding some examples of how this would be used, not in the code necessarily but just in the description? Maybe this is just for opening things like parquet files that support all the polars data types? Also, is this ready to land after review? |
Yes, this is ready for review. I updated the description. Does that work? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested and it works as expected. I'm good with these changes.
Thanks! |
#[inline] | ||
fn arr_to_value( | ||
dt: &DataType, | ||
arr: &dyn Array, | ||
idx: usize, | ||
span: Span, | ||
) -> Result<Value, ShellError> { | ||
macro_rules! downcast { | ||
($casttype:ident) => {{ | ||
let arr = &*(arr as *const dyn Array as *const $casttype); | ||
arr.value_unchecked(idx) | ||
}}; | ||
} | ||
|
||
// Not loving the unsafe here, however this largely based off the one | ||
// example I found for converting Array values in: | ||
// polars_core::chunked_array::ops::any_value::arr_to_any_value | ||
unsafe { | ||
match dt { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the safety condition depends on the correct DataType
being passed, this function itself has to be unsafe
.
Even if we declare it as such I really find it hard to validate that this whole operation is sound.
The call sites of this function need to guarantee both a valid type is ascribed and that all requirements with regards to the pointer and index are met.
Beyond that I am not super comfortable with a manul type reinterpretation that depends on the data representation as given to us by an external crate in this case both polars_core
and arrow2
. The former has been quite willing to break APIs so it feels to me like a ticking time bomb to make safety assumptions across crates outside of a simpler well documented interface.
If I have to read the crate source code to reassure me that this is sound and can't just read the documentation of a constrained unsafe
API I think this is just asking for future trouble.
This reverts commit fe92051.
Reverts #10943 The current implementation of `arr_to_value` is unsound, as it allows casts of arbitrary data to arbitrary types without being marked `unsafe`. The full safety requirements to perform both the cast and the following unchecked access are not as clear that a simple change of `fn arr_to_value` to `unsafe fn arr_to_value` could be blessed without further investigation. cc @ayax79
Provides support for reading Polars structs. This allows opening of supported files (jsonl, parquet, etc) that contain rows with structured data. The following attached json lines file([receipts.jsonl.gz](https://github.com/nushell/nushell/files/13311476/receipts.jsonl.gz)) contains a customer column with structured data. This json lines file can now be loaded via `dfr open` and will render as follows: <img width="525" alt="Screenshot 2023-11-09 at 10 09 18" src="https://github.com/nushell/nushell/assets/56345/4b26ccdc-c230-43ae-a8d5-8af88a1b72de"> This also addresses some cleanup of date handling and utilizing timezones where provided. This pull request only addresses reading data from polars structs. I will address converting nushell data to polars structs in a future request as this change is large enough as it is. --------- Co-authored-by: Jack Wright <[email protected]>
Reverts nushell#10943 The current implementation of `arr_to_value` is unsound, as it allows casts of arbitrary data to arbitrary types without being marked `unsafe`. The full safety requirements to perform both the cast and the following unchecked access are not as clear that a simple change of `fn arr_to_value` to `unsafe fn arr_to_value` could be blessed without further investigation. cc @ayax79
Provides support for reading Polars structs. This allows opening of supported files (jsonl, parquet, etc) that contain rows with structured data. The following attached json lines file([receipts.jsonl.gz](https://github.com/nushell/nushell/files/13311476/receipts.jsonl.gz)) contains a customer column with structured data. This json lines file can now be loaded via `dfr open` and will render as follows: <img width="525" alt="Screenshot 2023-11-09 at 10 09 18" src="https://github.com/nushell/nushell/assets/56345/4b26ccdc-c230-43ae-a8d5-8af88a1b72de"> This also addresses some cleanup of date handling and utilizing timezones where provided. This pull request only addresses reading data from polars structs. I will address converting nushell data to polars structs in a future request as this change is large enough as it is. --------- Co-authored-by: Jack Wright <[email protected]>
Reverts nushell#10943 The current implementation of `arr_to_value` is unsound, as it allows casts of arbitrary data to arbitrary types without being marked `unsafe`. The full safety requirements to perform both the cast and the following unchecked access are not as clear that a simple change of `fn arr_to_value` to `unsafe fn arr_to_value` could be blessed without further investigation. cc @ayax79
Provides support for reading Polars structs. This allows opening of supported files (jsonl, parquet, etc) that contain rows with structured data.
The following attached json lines file(receipts.jsonl.gz) contains a customer column with structured data. This json lines file can now be loaded via
dfr open
and will render as follows:This also addresses some cleanup of date handling and utilizing timezones where provided.
This pull request only addresses reading data from polars structs. I will address converting nushell data to polars structs in a future request as this change is large enough as it is.