Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement eq comparison for StructArray #5960

Closed
alamb opened this issue Jun 26, 2024 · 6 comments · Fixed by #5961
Closed

Implement eq comparison for StructArray #5960

alamb opened this issue Jun 26, 2024 · 6 comments · Fixed by #5961
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog

Comments

@alamb
Copy link
Contributor

alamb commented Jun 26, 2024

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
arrow::compute::kernels::cmp::eq does not support StructArray

Here is an example

use arrow::array::{ArrayRef, Int32Array, StructArray};
use arrow::compute::kernels::cmp::eq;
use arrow::datatypes::{DataType, Field, Schema};
use std::sync::Arc;

fn main() {
    let nested_schema = Arc::new(Schema::new(vec![
        Field::new("id", DataType::Int32, true),
        Field::new("lat", DataType::Int32, true),
        Field::new("long", DataType::Int32, true),
    ]));
    let schema = Arc::new(Schema::new(vec![
        Field::new("value", DataType::Int32, true),
        Field::new(
            "nested",
            DataType::Struct(nested_schema.fields.clone()),
            true,
        ),
    ]));

    let arr1: ArrayRef = Arc::new(StructArray::from(vec![
        (
            Arc::new(Field::new("id", DataType::Int32, true)),
            Arc::new(Int32Array::from(vec![1, 2, 3])) as ArrayRef,
        ),
        (
            Arc::new(Field::new("lat", DataType::Int32, true)),
            Arc::new(Int32Array::from(vec![1, 2, 3])) as ArrayRef,
        ),
        (
            Arc::new(Field::new("long", DataType::Int32, true)),
            Arc::new(Int32Array::from(vec![1, 2, 3])) as ArrayRef,
        ),
    ]));

    let arr2: ArrayRef = Arc::new(StructArray::from(vec![
        (
            Arc::new(Field::new("id", DataType::Int32, true)),
            Arc::new(Int32Array::from(vec![1, 2, 3])) as ArrayRef,
        ),
        (
            Arc::new(Field::new("lat", DataType::Int32, true)),
            Arc::new(Int32Array::from(vec![1, 2, 3])) as ArrayRef,
        ),
        (
            Arc::new(Field::new("long", DataType::Int32, true)),
            Arc::new(Int32Array::from(vec![1, 2, 3])) as ArrayRef,
        ),
    ]));

    let eq_result = eq(&arr1, &arr2).unwrap();
    println!("{:?}", eq_result);
}

It currenty fails like this:

thread 'main' panicked at src/main.rs:56:38:
called `Result::unwrap()` on an `Err` value: InvalidArgumentError("Invalid comparison operation: Struct([Field { name: \"id\", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: \"lat\", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: \"long\", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }]) == Struct([Field { name: \"id\", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: \"lat\", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }, Field { name: \"long\", data_type: Int32, nullable: true, dict_id: 0, dict_is_ordered: false, metadata: {} }])")

Describe the solution you'd like
I would like eq and the other comparison kernels to support StructArray and the example to pass

Describe alternatives you've considered
We can implement comparison downstream via arrow::array::make_comparator as done in apache/datafusion#11117

Additional context
This was reported upstream in datafusion: apache/datafusion#10749

@jayzhan211 added comparison to the dynamic comparator in #5792

@alamb
Copy link
Contributor Author

alamb commented Jun 26, 2024

Here is the relevant error that is happening:

return Err(ArrowError::InvalidArgumentError(format!(

@jayzhan211
Copy link
Contributor

jayzhan211 commented Jun 26, 2024

Note that we have a discussion about where to implement comparison for List #5407

The conclusion is that we implement it in datafusion, I have done it already apache/datafusion#11091

If we have the similar configurable null issue for struct, we should also implement it in datafusion.

@alamb
Copy link
Contributor Author

alamb commented Jun 26, 2024

Thanks @jayzhan211 -- I didn't know that.

If this is a design decision in arrow-rs perhaps we can update the documentation / the error message to make it clear that not only is operation not supported, it will not be (and hint to use make_comparator instead)

@jayzhan211
Copy link
Contributor

jayzhan211 commented Jun 26, 2024

There is docs in make_comparator #5942 . But error message could be improved

@alamb
Copy link
Contributor Author

alamb commented Jun 26, 2024

There is docs in make_comparator #5942 . But error message could be improved

Ah, thanks I missed that. I will make a PR to improve the error message

@alamb
Copy link
Contributor Author

alamb commented Jun 26, 2024

#5961

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate enhancement Any new improvement worthy of a entry in the changelog
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants