Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FLINK-12371] [table-planner-blink] Add support for converting (NOT) IN/ (NOT) EXISTS to semi-join/anti-join, and generating optimized logical plan #8317

Merged
merged 5 commits into from
May 11, 2019

Conversation

godfreyhe
Copy link
Contributor

@godfreyhe godfreyhe commented Apr 30, 2019

What is the purpose of the change

Add support for converting (NOT) IN/ (NOT) EXISTS to semi-join/anti-join, and generating optimized logical plan

Brief change log

  • Add SemiJoin to support anti-join and non-equi join
  • Add FlinkSubQueryRemoveRule to convert (NOT) IN/ (NOT) EXISTS to SemiJoin
  • Add SimplifyFilterConditionRule to simplify filter condition (including filter condition in RexSubQuery)
  • Update BatchExecHashJoinRule to convert FlinkLogicalSemiJoin to BatchExecHashSemiJoin, so do BatchExecNestedLoopJoinRule and BatchExecSortMergeJoinRule
  • Add StreamExecSemiJoinRule to convert FlinkLogicalSemiJoin to SteamExecSemiJoin

Verifying this change

This change added tests and can be verified as follows:

  • Added FlinkJoinPushExpressionsRuleTest that validates FlinkJoinPushExpressionsRule
  • Added SimplifyFilterConditionRuleTest that validates SimplifyFilterConditionRule
  • Added tests that validate logical plan after FlinkSubQueryRemoveRule applied
  • Added tests that validate physical plan after physical convert rules applied

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): (no)
  • The public API, i.e., is any changed class annotated with @Public(Evolving): (no)
  • The serializers: (no)
  • The runtime per-record code paths (performance sensitive): (no)
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
  • The S3 file system connector: (no)

Documentation

  • Does this pull request introduce a new feature? (no)
  • If yes, how is the feature documented? (not documented)

@flinkbot
Copy link
Collaborator

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Review Progress

  • ❓ 1. The [description] looks good.
  • ❓ 2. There is [consensus] that the contribution should go into to Flink.
  • ❓ 3. Needs [attention] from.
  • ❓ 4. The change fits into the overall [architecture].
  • ❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.


The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required Bot commands
The @flinkbot bot supports the following commands:

  • @flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
  • @flinkbot approve all to approve all aspects
  • @flinkbot approve-until architecture to approve everything until architecture
  • @flinkbot attention @username1 [@username2 ..] to require somebody's attention
  • @flinkbot disapprove architecture to remove an approval you gave earlier

*
* <p>The effect is something like the SQL {@code IN} operator.
*/
public class SemiJoin extends Join {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @danny0405 , does this consistent with calcite's latest changes?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, SemiJoin would be deprecated in Calcite 1.20.0, we should use Join with JoinRelType#SEMI instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@godfreyhe how about trying to be consistent with Calcite? Will that involve a lot of changes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can borrow some implementations from apache/calcite#1157 now, and remove them after we upgrade a calcite version which contains that pr (maybe calcite 1.20). I finally find that only Join, JoinRelType, RelDecorrelator and some utility methods need to be copied to Flink project, and it's more clearer than before. I find some bugs and some features to improve, and have feedback to @danny0405

@godfreyhe godfreyhe changed the title [FLINK-12371] [table-planner-blink] Add support for converting (NOT) IN/ (NOT) EXISTS to SemiJoin, and generating optimized logical plan [FLINK-12371] [table-planner-blink] Add support for converting (NOT) IN/ (NOT) EXISTS to semi-join/anti-join, and generating optimized logical plan May 8, 2019
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.calcite.rel.core;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should try to avoid making this package exactly the same with Calcite's? It will make developer confusing which one did we import.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If packages are different betweent Flink Join and Calcite Join, we need to copy all codes associated with Join from Calcite to Flink. This class will be delete when calcite-1.20 is released. (may be one month later)

@KurtYoung
Copy link
Contributor

changes looks good to me, please correct failed tests

@godfreyhe
Copy link
Contributor Author

@KurtYoung I have fixed the failed test cases, the current failing is in flink-yarn-tests module.

@KurtYoung KurtYoung merged commit 76ae39a into apache:master May 11, 2019
@godfreyhe godfreyhe deleted the FLINK-12371 branch June 1, 2019 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants