[CBO] JoinSelection Rule, select HashJoin Partition Mode based on the Join Type and available statistics, option for SortMergeJoin #4219

mingmwang · 2022-11-15T10:30:07Z

Which issue does this PR close?

Closes #4139
closes #4230.

Rationale for this change

Choose join orderings based on cost based optimizer

What changes are included in this PR?

The PR covers below changes,

Rename Rule HashBuildProbeOrder to JoinSelection.
Expand the rule's capabilities and make a cost based decision to select which PartitionMode mode(Partitioned/CollectLeft) is optimal for HashJoin
Add a session level configuration prefer_hash_join to prefer HashJoin or SortMergeJoin, document SortMergeJoin support currently is experimental.

Are these changes tested?

Are there any user-facing changes?

… available statistics

mingmwang · 2022-11-15T10:32:35Z

@alamb @Dandandan @yahoNanJing @jackwener
Please help to take a look.

isidentical

This is a super cool PR @mingmwang! I am wondering whether we should keep everything in JoinSelection or split it up (or maybe rename it to JoinSideSelection to make it precise that this is a local join side optimizer)?

isidentical · 2022-11-15T20:08:40Z

datafusion/core/src/physical_plan/joins/sort_merge_join.rs

+ // TODO stats: it is not possible in general to know the output size of joins
+ // There are some special cases though, for example:
+ // - `A LEFT JOIN B ON A.col=B.col` with `COUNT_DISTINCT(B.col)=COUNT(B.col)`
+ estimate_join_statistics(


Really glad that this can also be used in other places 💯

isidentical · 2022-11-15T20:10:12Z

datafusion/core/src/physical_plan/planner.rs

+ // Sort-Merge join support currently is experimental
+ if join_filter.is_some() {
+ // TODO SortMergeJoinExec need to support join filter
+ Err(DataFusionError::Plan("SortMergeJoinExec does not support join_filter now.".to_string()))


Suggested change

Err(DataFusionError::Plan("SortMergeJoinExec does not support join_filter now.".to_string()))

Err(DataFusionError::NotImplemented("SortMergeJoinExec does not support join_filter now.".to_string()))

Sure, will do.

isidentical · 2022-11-15T20:15:12Z

datafusion/core/src/physical_optimizer/join_selection.rs

+ if let Some(size) = plan.statistics().total_byte_size {
+ size < collection_size_threshold
+ } else if let Some(row_count) = plan.statistics().num_rows {
+ row_count < collection_size_threshold


Question: are we treating both bytes and rows equally here? Seems like collection_size_threshold has a unit of bytes compared to the row_count which specifies number of rows. I guess we can normalize it a bit if we want to pursue this (e.g. collection_size_threshold / SOME_MAGIC_CONSTANT) but otherwise it might be a bit off from a desirable scenerio.

My original thinking is making the threshold configuration represent both 1M in bytes or 1M number of rows.
Or maybe we can keep two different configuration items explicitly and document them clearly..

isidentical · 2022-11-15T20:19:54Z

datafusion/core/src/physical_optimizer/join_selection.rs

+ }
+}
+
+fn try_collect_left(


Would it make sense to describe the logic here for each of the different scenarios (i was a bit lost till the end to figure out each state and how it should behave)?

Sure, I will add more comments.

alamb · 2022-11-15T22:24:12Z

I will try and review this carefully tomorrow

mingmwang · 2022-11-16T03:03:45Z

There is some bug with HashJoin CollectLeft:

#4230

…epartition_joins

jackwener · 2022-11-20T10:24:41Z

datafusion/core/src/physical_optimizer/join_selection.rs

+ ) {
+ Ok(Arc::new(new_join))
+ } else {
+ // TODO avoid adding ProjectionExec again and again, only adding Final Projection


very careful👍! I also meet this problem in Doris.

I thought this was DataFusion specific problem. Because for HashJoin, DataFusion always choose to build the left side. If DataFusion can also support build right side, then there is no need to swap the Join, and can avoid this problem.

Some databases choose to build one side, TiDB also is.

alamb · 2022-11-20T12:11:57Z

As I wrote elsewhere,

I plan to review this and other join related PRs tomorrow. I apologize for the delays. The join work is really neat, but it is not a high priority at the moment in IOx so I have had to prioritize other work higher and do join related

I appreciate the help that @jackwener @mingmwang are giving each other in the review process. 🙏

alamb

I think this looks really good @mingmwang -- thank you .

I recommend moving the settings into ConfigOptions to make them more discoverable and documented. I really like the idea of a feature that is off by default as we work out the details.

The only other thing that would be nice to see for this feature is some sort of overall integration test (e.g. that shows a plan with join reordering happening as well as a SortMergeJoin). That is present in the unit tests but not the overall integration tests

Also, is the SortMergeJoin exercised anywhere in sql level tests?

Anyone else I found https://github.com/apache/arrow-datafusion/pull/4219/files?w=1 easier to review for diffs

alamb · 2022-11-21T16:52:37Z

datafusion/core/src/execution/context.rs

@@ -1228,6 +1228,11 @@ pub struct SessionConfig {
 pub collect_statistics: bool,
 /// Should DataFusion optimizer run a top down process to reorder the join keys
 pub top_down_join_key_reordering: bool,
+ /// Should DataFusion optimizer prefer HashJoin over SortMergeJoin.
+ /// HashJoin can work more efficently than SortMergeJoin but consumes more memory.
+ pub prefer_hash_join: bool,


What would you think about moving these new settings into ConfigOptions (where they are visible via SHOW and automatically documented)?

Sure, will move the new added settings to ConfigOptions.

alamb · 2022-11-21T16:53:26Z

datafusion/core/src/physical_optimizer/join_selection.rs

+// specific language governing permissions and limitations
+// under the License.
+
+//! Utilizing exact statistics from sources to avoid scanning data


this comment seems incorrect for this module

This was coming from the original hash_build_probe_order.rs. I will fix it in this PR.

alamb · 2022-11-21T16:53:40Z

datafusion/core/src/physical_optimizer/join_selection.rs

+use crate::error::Result;
+use crate::physical_plan::rewrite::TreeNodeRewritable;
+
+/// For hash join with the partition mode [PartitionMode::Auto], JoinSelection rule will make


I like this design with PartitionMode::Auto

alamb · 2022-11-21T16:54:23Z

datafusion/core/src/physical_optimizer/join_selection.rs

+}
+
+// TODO we need some performance test for Right Semi/Right Join swap to Left Semi/Left Join in case that the right side is smaller but not much smaller.
+// TODO In PrestoSQL, the optimizer flips join sides only if one side is much smaller than the other by more than SIZE_DIFFERENCE_THRESHOLD times, by default is is 8 times.


the prestosql approach makes sense to me

Actually I am not very sure about this, because our HashJoin implementation is quite different from PrestoSQL's. I think we need to do more benchmark on this.

alamb · 2022-11-21T16:55:03Z

datafusion/core/src/physical_optimizer/join_selection.rs

+ collection_size_threshold: usize,
+) -> bool {
+ // Currently we do not trust the 0 value from stats, due to stats collection might have bug
+ // TODO check the logic in datasource::get_statistics_with_limit()


is this TODO worth tracking with a ticket?

Sure, will do.

alamb · 2022-11-21T16:57:44Z

datafusion/core/src/physical_optimizer/join_selection.rs

+ match filter {
+ Some(filter) => {


Instead of match filter I think you can use map like:

filter.map(|filter| { let column_indicies = ...

Not critical, I just figured I woudl point it out

Sure, will change it.

alamb · 2022-11-21T17:00:43Z

datafusion/core/src/physical_optimizer/join_selection.rs

+ PartitionMode::Auto => {
+ try_collect_left(hash_join, Some(collect_left_threshold))
+ .unwrap()
+ .or_else(|| Some(partitioned_hash_join(hash_join).unwrap()))


Why does this code (and below) use unwrap? I think they should return an error rather than panicing

I have another PR to address this. The interface of the Rewrite/transform Closures need to be changed.

#4318

alamb · 2022-11-21T17:07:02Z

datafusion/core/src/physical_plan/joins/mod.rs

@@ -29,6 +29,9 @@ pub enum PartitionMode {
 Partitioned,
 /// Left side will collected into one partition
 CollectLeft,
+ /// When set to Auto, DataFusion optimizer will decide which PartitionMode mode(Partitioned/CollectLeft) is optimal based on statistics.
+ /// It will also consider swapping the left and right inputs for the Join


The optimizer also will swap the inputs for Partitioned and CollectLeft mode too, right? As written, this comment could be confusing and imply that inputs will only be swapped if the mode is set to Auto

You are right. Especially when the mode is CollectLeft, some join types is unable to run CollectLeft mode.

https://github.com/apache/arrow-datafusion/blob/d5d2de3362649db85ad54161ee28d9374ed3437c/datafusion/core/src/physical_optimizer/join_selection.rs#L284-L286

That's why in this PR I also modify the existing UTs in https://github.com/apache/arrow-datafusion/blob/d5d2de3362649db85ad54161ee28d9374ed3437c/datafusion/core/tests/sql/joins.rs
to make sure both the CollectLeft/Partitioned mode have enough test coverage.

alamb · 2022-11-21T17:07:23Z

datafusion/core/src/execution/context.rs

@@ -1228,6 +1228,11 @@ pub struct SessionConfig {
 pub collect_statistics: bool,
 /// Should DataFusion optimizer run a top down process to reorder the join keys
 pub top_down_join_key_reordering: bool,
+ /// Should DataFusion optimizer prefer HashJoin over SortMergeJoin.
+ /// HashJoin can work more efficently than SortMergeJoin but consumes more memory.


Suggested change

/// HashJoin can work more efficently than SortMergeJoin but consumes more memory.

/// HashJoin can work more efficiently than SortMergeJoin but consumes more memory. Defaults to true

alamb · 2022-11-21T17:10:31Z

datafusion/core/src/execution/context.rs

+ /// HashJoin can work more efficently than SortMergeJoin but consumes more memory.
+ pub prefer_hash_join: bool,
+ /// The maximum estimated size in bytes for the left input a hash join will be collected into one partition
+ pub hash_join_collect_left_threshold: usize,


I recommend calling this seetting something that doesn't have 'left' and 'right' as that can get confusing.

How about hash_join_single_partition_threshold?

mingmwang · 2022-11-23T13:12:30Z

@alamb Would you mind to take a look again ?

alamb

I think it looks great -- thank you @mingmwang

alamb · 2022-11-23T18:28:01Z

datafusion/core/tests/sql/joins.rs

+}
+
+#[tokio::test]
+async fn sort_merge_join_on_date32() -> Result<()> {


alamb · 2022-11-23T18:29:55Z

I will plan to merge this PR tomorrow unless anyone else would like more time to review

isidentical · 2022-11-23T18:48:23Z

I wasn't able to go through the last revision in detail but overall this looks great 💯 A minor question of mine is still standing though (maybe let's do a follow up ticket on it): #4219 (comment) (collection_size_threshold represents the size in both the number of rows and the number of bytes, which is a bit confusing because the distinction is only internal when one form of statistics is not available)

jackwener

Nice Job👍.
I try to checkout this branch and git pull.
Look like we need fix conflict.

jackwener · 2022-11-24T01:20:24Z

datafusion/core/src/physical_optimizer/join_selection.rs

+ }
+}
+
+fn swap_join_type(join_type: JoinType) -> JoinType {


we can put it into JoinType

jackwener · 2022-11-24T01:22:17Z

datafusion/core/src/physical_optimizer/join_selection.rs

+ ) {
+ Ok(Arc::new(new_join))
+ } else {
+ // TODO avoid adding ProjectionExec again and again, only adding Final Projection


Some databases choose to build one side, TiDB also is.

alamb · 2022-11-24T11:36:47Z

Look like we need fix conflict.

I am not sure what conflict you ran into @jackwener. I checked this branch out locally and merged master to this branch and reran all the tests and they passed.

ursabot · 2022-11-24T11:42:36Z

Benchmark runs are scheduled for baseline = 22fdbcf and contender = 561be4f. 561be4f is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-datafusion-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

jackwener · 2022-11-24T12:03:35Z

It should be my mistake.🥲

[CBO] JoinSelection Rule, select HashJoin Partition Mode based on the…

40506ef

… available statistics

github-actions bot added the core Core DataFusion crate label Nov 15, 2022

isidentical reviewed Nov 15, 2022

View reviewed changes

mingmwang added 3 commits November 16, 2022 22:37

Fix HashJoin CollectLeft bug, refine UT to cover 'enable'/'disable' r…

38ebcd8

…epartition_joins

merge with upstream

e920f1f

add comments

e1a98a0

mingmwang changed the title ~~[CBO] JoinSelection Rule, select HashJoin Partition Mode based on the available statistics~~ [CBO] JoinSelection Rule, select HashJoin Partition Mode based on the available statistics and Join Type Nov 16, 2022

mingmwang changed the title ~~[CBO] JoinSelection Rule, select HashJoin Partition Mode based on the available statistics and Join Type~~ [CBO] JoinSelection Rule, select HashJoin Partition Mode based on the Join Type and available statistics Nov 16, 2022

ignore 0 stats

d5d2de3

jackwener reviewed Nov 20, 2022

View reviewed changes

alamb requested a review from Dandandan November 21, 2022 16:51

alamb reviewed Nov 21, 2022

View reviewed changes

alamb changed the title ~~[CBO] JoinSelection Rule, select HashJoin Partition Mode based on the Join Type and available statistics~~ [CBO] JoinSelection Rule, select HashJoin Partition Mode based on the Join Type and available statistics, option for SortMergeJoin Nov 21, 2022

mingmwang added 3 commits November 23, 2022 16:57

Resolve review comments, add intg UT for SMJ

6402288

Resolve conflicts, merge with upstream

16ebf8b

fix conflicts

7d370ee

github-actions bot added the optimizer Optimizer rules label Nov 23, 2022

mingmwang added 4 commits November 23, 2022 20:00

tiny fix to doc

264822c

refine swap_join_filter()

525bed1

update configs.md

aca543c

fix configs.md

8049a36

alamb approved these changes Nov 23, 2022

View reviewed changes

datafusion/core/tests/sql/joins.rs

}

#[tokio::test]

async fn sort_merge_join_on_date32() -> Result<()> {

Copy link

Contributor

alamb Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

jackwener approved these changes Nov 24, 2022

View reviewed changes

alamb merged commit 561be4f into apache:master Nov 24, 2022

Dandandan mentioned this pull request Apr 14, 2023

Improve DataFusion scalability as more cores are added #5999

Open

	Err(DataFusionError::Plan("SortMergeJoinExec does not support join_filter now.".to_string()))
	Err(DataFusionError::NotImplemented("SortMergeJoinExec does not support join_filter now.".to_string()))

	/// HashJoin can work more efficently than SortMergeJoin but consumes more memory.
	/// HashJoin can work more efficiently than SortMergeJoin but consumes more memory. Defaults to true

[CBO] JoinSelection Rule, select HashJoin Partition Mode based on the Join Type and available statistics, option for SortMergeJoin #4219

[CBO] JoinSelection Rule, select HashJoin Partition Mode based on the Join Type and available statistics, option for SortMergeJoin #4219

Conversation

mingmwang commented Nov 15, 2022 • edited by alamb Loading

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

mingmwang commented Nov 15, 2022

isidentical left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Nov 15, 2022

mingmwang commented Nov 16, 2022

jackwener Nov 20, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Nov 20, 2022

alamb left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mingmwang commented Nov 23, 2022

alamb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Nov 23, 2022

isidentical commented Nov 23, 2022

jackwener left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alamb commented Nov 24, 2022

ursabot commented Nov 24, 2022

jackwener commented Nov 24, 2022

mingmwang commented Nov 15, 2022 •

edited by alamb

Loading

jackwener Nov 20, 2022 •

edited

Loading

alamb left a comment •

edited

Loading