Improve Error Handling and Readibility for downcasting `Decimal128Array` #4168

retikulum · 2022-11-10T13:21:46Z

Which issue does this PR close?

Part of #3152.

Rationale for this change

It is very clear in the issue but there are different schemas while downcasting an ArrayRef to a related arrow array type. This is the new PR of improving downcasting to Decimal128Array

What changes are included in this PR?

as_decimal128_array is created in datafusion\common\src\cast.rs
Refactor parts where following lines are used:

as_any().downcast_ref::<Decimal128Array>().unwrap()

as_decimal_array()

This commit passed related tests.
Change get_decimal_value_from_array functions return type ScalarValue to Result<ScalarValue>
Out of scope but dyn_cmp_dict is added to datafusion\physical-expr\Cargo.toml for passing test_dictionary_type_to_array_coersion. Before adding to this, following error was returned.

Error: ArrowError(CastError("Comparing array of type Dictionary(Int32, Utf8) with array of type Dictionary(Int32, Utf8) requires \"dyn_cmp_dict\" feature"))

Are there any user-facing changes?

I am not sure about it.

ozankabak

Thank you for these changes, much appreciated!

I saw two issues with code style/structure and left inline comments. I think your other PRs may have similar issues in them too. It would be great if you can improve code quality by applying these suggestions to those code pieces too.

Short PRs are typically preferable to long PRs, but IMO this is a case where the opposite is true. Since the same kind of work was broken down to many PRs, the suggestions you get in one of those PRs will probably end up being applicable to many of them. Therefore, it sometimes is better to have one cohesive PR in cases like this.

datafusion/common/src/scalar.rs

alamb

Thank you @retikulum . While I agree with @ozankabak 's suggestions I also think this PR is an improvement over the current state of the code so I would also be happy to merge it in as is.

Please let me know if you would like to continue to improve it or if I should plan to merge as is

retikulum · 2022-11-10T20:54:49Z

Thank you for these changes, much appreciated!

I saw two issues with code style/structure and left inline comments. I think your other PRs may have similar issues in them too. It would be great if you can improve code quality by applying these suggestions to those code pieces too.

Short PRs are typically preferable to long PRs, but IMO this is a case where the opposite is true. Since the same kind of work was broken down to many PRs, the suggestions you get in one of those PRs will probably end up being applicable to many of them. Therefore, it sometimes is better to have one cohesive PR in cases like this.

Thanks for review @ozankabak. These recommendations are great for improving code quality. I will change the suggested pieces.

However, I was confused with PR sizes. While we discussed implementation in the issue, @alamb also suggested that these PRs should be small ( but he suggested a few functions unlike my one function per PR). Rather than implementing one function at one PR, it seems better to implement 3-5 (or more) functions for a PR. My next PRs will include more function implementation.

retikulum · 2022-11-10T20:56:25Z

Thank you @retikulum . While I agree with @ozankabak 's suggestions I also think this PR is an improvement over the current state of the code so I would also be happy to merge it in as is.

Please let me know if you would like to continue to improve it or if I should plan to merge as is

Thanks for the feedback. I will change suggested parts and resolve conflicts tomorrow then it will be ready to merge.

ozankabak · 2022-11-10T21:00:19Z

Thanks for review @ozankabak. These recommendations are great for improving code quality. I will change the suggested pieces.

Great, happy to help!

... (but he suggested a few functions unlike my one function per PR) ...

This is basically what I meant as well. This kind of grouping makes sense for PRs like this.

alamb · 2022-11-11T00:30:08Z

However, I was confused with PR sizes. While we discussed implementation in the issue, @alamb also #3152 (comment) that these PRs should be small ( but he suggested a few functions unlike my one function per PR). Rather than implementing one function at one PR, it seems better to implement 3-5 (or more) functions for a PR. My next PRs will include more function implementation.

Yeah there is no specific rule here.

Basically I have found the smaller your PRs the easier it will be to find enough someone with enough contiguous time to review them. For larger PRs (e.g. 500+ lines) I often need to spend 30-60 minutes to properly review them and finding blocks of time like that for me is much harder than finding 5-10 minutes for smaller PRs

alamb · 2022-11-11T11:44:04Z

Thanks again @retikulum

ursabot · 2022-11-11T11:52:17Z

Benchmark runs are scheduled for baseline = 5883e43 and contender = f2f8465. f2f8465 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

retikulum added 4 commits November 8, 2022 19:34

improve error messages while downcasting Int32Array

0219753

Merge branch 'master' of https://github.com/retikulum/arrow-datafusion

d49b74f

Merge branch 'master' of https://github.com/retikulum/arrow-datafusion

673977f

improve error messages while downcasting Decimal128Array

ab02fc6

github-actions bot added the physical-expr Physical Expressions label Nov 10, 2022

ozankabak reviewed Nov 10, 2022

View reviewed changes

datafusion/common/src/scalar.rs Outdated Show resolved Hide resolved

datafusion/common/src/scalar.rs Outdated Show resolved Hide resolved

alamb approved these changes Nov 10, 2022

View reviewed changes

retikulum added 2 commits November 11, 2022 12:42

Merge branch 'master' of https://github.com/retikulum/arrow-datafusion

c8c8feb

improve codestyle and resolve conflicts

a0ab039

alamb approved these changes Nov 11, 2022

View reviewed changes

alamb merged commit f2f8465 into apache:master Nov 11, 2022

retikulum deleted the 3152_improve_downcast_decimal128_array branch November 16, 2022 10:44

tustvold mentioned this pull request Nov 27, 2022

Make Binary Dictionary Operations Optional #4386

Closed

tustvold mentioned this pull request Jan 20, 2023

Add dictionary_expresions feature (#4386) #4999

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve Error Handling and Readibility for downcasting `Decimal128Array` #4168

Improve Error Handling and Readibility for downcasting `Decimal128Array` #4168

retikulum commented Nov 10, 2022

ozankabak left a comment

alamb left a comment

retikulum commented Nov 10, 2022

retikulum commented Nov 10, 2022

ozankabak commented Nov 10, 2022 •

edited

Loading

alamb commented Nov 11, 2022

alamb commented Nov 11, 2022

ursabot commented Nov 11, 2022

Improve Error Handling and Readibility for downcasting Decimal128Array #4168

Improve Error Handling and Readibility for downcasting Decimal128Array #4168

Conversation

retikulum commented Nov 10, 2022

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are there any user-facing changes?

ozankabak left a comment

Choose a reason for hiding this comment

alamb left a comment

Choose a reason for hiding this comment

retikulum commented Nov 10, 2022

retikulum commented Nov 10, 2022

ozankabak commented Nov 10, 2022 • edited Loading

alamb commented Nov 11, 2022

alamb commented Nov 11, 2022

ursabot commented Nov 11, 2022

Improve Error Handling and Readibility for downcasting `Decimal128Array` #4168

Improve Error Handling and Readibility for downcasting `Decimal128Array` #4168

ozankabak commented Nov 10, 2022 •

edited

Loading