Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Error Handling and Readibility for downcasting Decimal128Array #4168

Merged
merged 6 commits into from
Nov 11, 2022
Merged

Improve Error Handling and Readibility for downcasting Decimal128Array #4168

merged 6 commits into from
Nov 11, 2022

Conversation

retikulum
Copy link
Contributor

Which issue does this PR close?

Part of #3152.

Rationale for this change

It is very clear in the issue but there are different schemas while downcasting an ArrayRef to a related arrow array type. This is the new PR of improving downcasting to Decimal128Array

What changes are included in this PR?

  • as_decimal128_array is created in datafusion\common\src\cast.rs
  • Refactor parts where following lines are used:
as_any().downcast_ref::<Decimal128Array>().unwrap()
as_decimal_array()
  • This commit passed related tests.
  • Change get_decimal_value_from_array functions return type ScalarValue to Result<ScalarValue>
  • Out of scope but dyn_cmp_dict is added to datafusion\physical-expr\Cargo.toml for passing test_dictionary_type_to_array_coersion. Before adding to this, following error was returned.
Error: ArrowError(CastError("Comparing array of type Dictionary(Int32, Utf8) with array of type Dictionary(Int32, Utf8) requires \"dyn_cmp_dict\" feature"))

Are there any user-facing changes?

I am not sure about it.

@github-actions github-actions bot added the physical-expr Physical Expressions label Nov 10, 2022
Copy link
Contributor

@ozankabak ozankabak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for these changes, much appreciated!

I saw two issues with code style/structure and left inline comments. I think your other PRs may have similar issues in them too. It would be great if you can improve code quality by applying these suggestions to those code pieces too.

Short PRs are typically preferable to long PRs, but IMO this is a case where the opposite is true. Since the same kind of work was broken down to many PRs, the suggestions you get in one of those PRs will probably end up being applicable to many of them. Therefore, it sometimes is better to have one cohesive PR in cases like this.

datafusion/common/src/scalar.rs Outdated Show resolved Hide resolved
datafusion/common/src/scalar.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @retikulum . While I agree with @ozankabak 's suggestions I also think this PR is an improvement over the current state of the code so I would also be happy to merge it in as is.

Please let me know if you would like to continue to improve it or if I should plan to merge as is

@retikulum
Copy link
Contributor Author

Thank you for these changes, much appreciated!

I saw two issues with code style/structure and left inline comments. I think your other PRs may have similar issues in them too. It would be great if you can improve code quality by applying these suggestions to those code pieces too.

Short PRs are typically preferable to long PRs, but IMO this is a case where the opposite is true. Since the same kind of work was broken down to many PRs, the suggestions you get in one of those PRs will probably end up being applicable to many of them. Therefore, it sometimes is better to have one cohesive PR in cases like this.

Thanks for review @ozankabak. These recommendations are great for improving code quality. I will change the suggested pieces.

However, I was confused with PR sizes. While we discussed implementation in the issue, @alamb also suggested that these PRs should be small ( but he suggested a few functions unlike my one function per PR). Rather than implementing one function at one PR, it seems better to implement 3-5 (or more) functions for a PR. My next PRs will include more function implementation.

@retikulum
Copy link
Contributor Author

Thank you @retikulum . While I agree with @ozankabak 's suggestions I also think this PR is an improvement over the current state of the code so I would also be happy to merge it in as is.

Please let me know if you would like to continue to improve it or if I should plan to merge as is

Thanks for the feedback. I will change suggested parts and resolve conflicts tomorrow then it will be ready to merge.

@ozankabak
Copy link
Contributor

ozankabak commented Nov 10, 2022

Thanks for review @ozankabak. These recommendations are great for improving code quality. I will change the suggested pieces.

Great, happy to help!

... (but he suggested a few functions unlike my one function per PR) ...

This is basically what I meant as well. This kind of grouping makes sense for PRs like this.

@alamb
Copy link
Contributor

alamb commented Nov 11, 2022

However, I was confused with PR sizes. While we discussed implementation in the issue, @alamb also #3152 (comment) that these PRs should be small ( but he suggested a few functions unlike my one function per PR). Rather than implementing one function at one PR, it seems better to implement 3-5 (or more) functions for a PR. My next PRs will include more function implementation.

Yeah there is no specific rule here.

Basically I have found the smaller your PRs the easier it will be to find enough someone with enough contiguous time to review them. For larger PRs (e.g. 500+ lines) I often need to spend 30-60 minutes to properly review them and finding blocks of time like that for me is much harder than finding 5-10 minutes for smaller PRs

@alamb alamb merged commit f2f8465 into apache:master Nov 11, 2022
@alamb
Copy link
Contributor

alamb commented Nov 11, 2022

Thanks again @retikulum

@ursabot
Copy link

ursabot commented Nov 11, 2022

Benchmark runs are scheduled for baseline = 5883e43 and contender = f2f8465. f2f8465 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@retikulum retikulum deleted the 3152_improve_downcast_decimal128_array branch November 16, 2022 10:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
physical-expr Physical Expressions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants