-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rule to reimplement Eliminate cross join
and remove it in planner
#4185
Conversation
ab369b8
to
18a9743
Compare
2fcbd4b
to
acc0989
Compare
\n TableScan: person\ | ||
\n SubqueryAlias: p2\ | ||
\n TableScan: person\ | ||
\n Inner Join: person.id = p.id\ | ||
\n CrossJoin:\ | ||
\n TableScan: person\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the queries now have more cross joins? Is this expected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expected
It's just because it's just build plan without optimization.
quick_test ()
just invoke logical_plan_with_dialect
acc0989
to
aee4741
Compare
aee4741
to
c45744d
Compare
a6e42cb
to
c5582ea
Compare
|
||
// look for expressions of the form `<column> = <column>` | ||
let mut possible_join_keys = vec![]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to see those logic is moved out from the planner.
I think it is because you make the rule run a bottom-up process, then lots of projections are added. If we make it a top-down |
Thanks @mingmwang 👍, I will fix them |
c5582ea
to
beabb9c
Compare
Thanks @mingmwang review !❤️ |
@jackwener I will take a closer look Tomorrow. |
I will try and review this in the next day or two -- but it may take me a while. Hopefully @mingmwang can help review as well |
b178f6e
to
9a8ae5c
Compare
There is specific logic in
To
|
Nice idea👍! It's a great future ticket. |
I plan to review this and other join related PRs tomorrow. I apologize for the delays. The join work is really neat, but it is not a high priority at the moment in IOx so I have had to prioritize other work higher and do join related I appreciate the help that @jackwener @mingmwang are giving each other in the review process. 🙏 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through this PR quite thoroughly -- thank you very much @jackwener
I'll plan to merge this tomorrow unless there are any other comments
Inner Join: nation.n_regionkey = region.r_regionkey | ||
Inner Join: supplier.s_nationkey = nation.n_nationkey | ||
Inner Join: partsupp.ps_suppkey = supplier.s_suppkey | ||
Projection: part.p_partkey, part.p_mfgr, supplier.s_name, supplier.s_address, supplier.s_phone, supplier.s_acctbal, supplier.s_comment, nation.n_name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found the difference much easier to see with whitespace blind diff
https://github.com/apache/arrow-datafusion/pull/4185/files?w=1
I found github's rendering of this really hard to understand the change in plan -- I drew out the join graphs by hand to make sure they were the same.
fn extract_possible_join_keys( | ||
expr: &Expr, | ||
accum: &mut Vec<(Column, Column)>, | ||
) -> Result<()> { | ||
match expr { | ||
Expr::BinaryExpr(BinaryExpr { left, op, right }) => match op { | ||
Operator::Eq => match (left.as_ref(), right.as_ref()) { | ||
(Expr::Column(l), Expr::Column(r)) => { | ||
accum.push((l.clone(), r.clone())); | ||
Ok(()) | ||
} | ||
_ => Ok(()), | ||
}, | ||
Operator::And => { | ||
extract_possible_join_keys(left, accum)?; | ||
extract_possible_join_keys(right, accum) | ||
} | ||
_ => Ok(()), | ||
}, | ||
_ => Ok(()), | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you are thinking about this copy: https://github.com/apache/arrow-datafusion/blob/bcd624855778384ee27648161de73951e3fb6ea1/datafusion/optimizer/src/reduce_cross_join.rs#L238-L279
LogicalPlan::CrossJoin(_) => { | ||
flatten_join_inputs(child, possible_join_keys, all_inputs)?; | ||
} | ||
_ => all_inputs.push((*child).clone()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eventually it would be awesome to avoid so much cloneing -- maybe as a follow on PR
@@ -849,14 +955,14 @@ mod tests { | |||
|
|||
let expected = vec![ | |||
"Filter: (t4.c < UInt32(15) OR t4.c = UInt32(688)) AND (t4.c < UInt32(15) OR t3.c = UInt32(688) OR t3.b = t4.b) [a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32]", | |||
" Inner Join: t1.a = t3.a [a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32, a:UInt32, b:UInt32, c:UInt32]", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
I took the liberty of merging master into the branch so that we avoid any possible logical conflicts. Thanks again for all the comments @mingmwang and @Dandandan |
I agree it would be a good idea for a future ticket. There is also https://github.com/apache/arrow-datafusion/blob/master/datafusion/optimizer/src/rewrite_disjunctive_predicate.rs which I think tries to do the same thing |
cc @xudong963 |
2981931
to
27f965a
Compare
Pulled master into it. Thanks @alamb ❤️ |
Yes, thanks for your mention :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to see the logical moved from planner to optimizer, nice clean.
Maybe @DhamoPS would like to take a look, I remember he has worked similar work!
27f965a
to
2e7a4c4
Compare
fix new conflict. |
Good call @xudong963 -- I am going to merge this PR now as it has been outstanding for a long time. Thanks again @jackwener the recent contributions are awesome |
Benchmark runs are scheduled for baseline = bfce076 and contender = d355f69. d355f69 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
Which issue does this PR close?
Closes #4176.
Part of #4267
eliminate_cross_join
#4176..eliminate_cross_join
from plannereliminate_cross_join
to avoid to useglobal-state
. In new implementation, we just depends only on the shape of the subtree.Rationale for this change
What changes are included in this PR?
Reimplement
Eliminate cross join
Are these changes tested?
Test
Reorder join
for eliminating cross join.Are there any user-facing changes?