[FLINK-12098] [table-planner-blink] Add support for generating optimized logical plan for simple group aggregate on stream #8110

godfreyhe · 2019-04-03T12:29:35Z

What is the purpose of the change

Add support for generating optimized logical plan for simple group aggregate on stream

Brief change log

add StreamExecGroupAggregateRule to convert logical aggregate to StreamExecGroupAggregate
add TwoStageOptimizedAggregateRule to write StreamExecGroupAggregate to two-stage aggregates
add StreamExecRetractionRules to handle retraction message

Verifying this change

This change added tests and can be verified as follows:

Added AggregateTest for StreamExecGroupAggregateRule
Added TwoStageAggregateTest for TwoStageOptimizedAggregateRule
Added RetractionRulesTest for retraction rules

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): (no)
The public API, i.e., is any changed class annotated with @Public(Evolving): ( no)
The serializers: (no)
The runtime per-record code paths (performance sensitive): (no)
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: (no)
The S3 file system connector: (no)

Documentation

Does this pull request introduce a new feature? (no)
If yes, how is the feature documented? (not documented)

flinkbot · 2019-04-03T12:29:43Z

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Review Progress

❓ 1. The [description] looks good.
❓ 2. There is [consensus] that the contribution should go into to Flink.
❓ 3. Needs [attention] from.
❓ 4. The change fits into the overall [architecture].
❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
@flinkbot approve all to approve all aspects
@flinkbot approve-until architecture to approve everything until architecture
@flinkbot attention @username1 [@username2 ..] to require somebody's attention
@flinkbot disapprove architecture to remove an approval you gave earlier

KurtYoung · 2019-04-04T02:27:10Z

.../scala/org/apache/flink/table/runtime/functions/aggfunctions/MaxAggFunctionWithRetract.scala

+import java.sql.{Date, Time, Timestamp}
+
+/** The initial accumulator for Max with retraction aggregate function */
+class MaxWithRetractAccumulator[T] {


Can we make this Java?

ok, should implements MinWithRetractAggFunctionTest as java in this PR?

KurtYoung · 2019-04-04T02:27:26Z

.../scala/org/apache/flink/table/runtime/functions/aggfunctions/MinAggFunctionWithRetract.scala

+import java.sql.{Date, Time, Timestamp}
+
+/** The initial accumulator for Min with retraction aggregate function */
+class MinWithRetractAccumulator[T] {


Can we make this Java?

KurtYoung · 2019-04-04T02:32:29Z

...link-table-planner-blink/src/main/scala/org/apache/flink/table/plan/util/AggregateUtil.scala

+ groupSize: Int,
+ needRetraction: Boolean,
+ aggs: Seq[AggregateCall]): Array[Boolean] = {
+


delete blank line

KurtYoung · 2019-04-04T02:32:35Z

...link-table-planner-blink/src/main/scala/org/apache/flink/table/plan/util/AggregateUtil.scala

+ inputRowType: RelDataType,
+ groupSet: Array[Int],
+ typeFactory: FlinkTypeFactory): RelDataType = {
+


delete blank line

KurtYoung · 2019-04-04T02:32:39Z

...link-table-planner-blink/src/main/scala/org/apache/flink/table/plan/util/AggregateUtil.scala

+ * Derives accumulators names from aggregate
+ */
+ def inferAggAccumulatorNames(aggInfoList: AggregateInfoList): Array[String] = {
+


delete blank line

KurtYoung · 2019-04-08T02:48:28Z

...k/src/main/java/org/apache/flink/table/functions/aggfunctions/MaxWithRetractAggFunction.java

+ /** The initial accumulator for Max with retraction aggregate function. */
+ public static class MaxWithRetractAccumulator<T> {
+ public T max;
+ public Long distinctCount;


just use mapSize? the name is a little bit confusing with distinct agg

KurtYoung · 2019-04-08T02:48:53Z

...k/src/main/java/org/apache/flink/table/functions/aggfunctions/MaxWithRetractAggFunction.java

+ @Override
+ public MaxWithRetractAccumulator<T> createAccumulator() {
+ MaxWithRetractAccumulator<T> acc = new MaxWithRetractAccumulator<>();
+ acc.max = getInitValue(); // max


Why not just use NULL to represent init value for all sub-classes

I does not change any logic when porting to java.
for java, using NULL as init value is ok.

KurtYoung · 2019-04-08T02:58:13Z

...k/src/main/java/org/apache/flink/table/functions/aggfunctions/MaxWithRetractAggFunction.java

+ hasMax = true;
+ }
+ }
+ if (!hasMax) {


Why would this happen?

The behavior of deleting expired data in the state backend is uncertain.
so mapSize data may exist while map data may have been deleted when both of them are expired.

I have added some comments.

…zed logical plan for simple group aggregate on stream

… java, and move agg functions to org.apache.flink.table.functions.aggfunctions

fix NPE in MaxWithRetractAggFunction and MinWithRetractAggFunction

…ctAggFunction from String to BinaryString

KurtYoung · 2019-04-11T04:00:23Z

LGTM, +1 to merge

…ed logical plan for simple group aggregate on stream (apache#8110)

rmetzger added review=description? component=TableSQL/Planner labels Apr 3, 2019

KurtYoung reviewed Apr 4, 2019

View reviewed changes

godfreyhe force-pushed the FLINK-12098 branch from 871bebd to 52410de Compare April 6, 2019 11:00

KurtYoung reviewed Apr 8, 2019

View reviewed changes

godfreyhe force-pushed the FLINK-12098 branch 2 times, most recently from f980267 to bf9ee8c Compare April 9, 2019 01:40

godfreyhe added 11 commits April 11, 2019 11:18

[FLINK-12098] [table-planner-blink] Add support for generating optimi…

9432218

…zed logical plan for simple group aggregate on stream

add comments for TwoStageOptimizedAggregateRule

f4a0d2d

fix checkstyle error

c53a72d

implements MaxWithRetractAggFunction and MinWithRetractAggFunction as…

376cd02

… java, and move agg functions to org.apache.flink.table.functions.aggfunctions

fix checkstyle error

e89e035

implements AggFunction Test classes as java

6e86b22

fix NPE in MaxWithRetractAggFunction and MinWithRetractAggFunction

change type of StringMinWithRetractAggFunction and StringMaxWithRetra…

f36f32b

…ctAggFunction from String to BinaryString

fix failed test

081154c

refactor MaxWithRetractAggFunctionTest and MinWithRetractAggFunctionTest

41cff98

minor update

a39e2c2

fix checkstyle error

ab600ec

godfreyhe force-pushed the FLINK-12098 branch from 72af0e6 to ab600ec Compare April 11, 2019 03:25

KurtYoung merged commit 517a04f into apache:master Apr 11, 2019

godfreyhe deleted the FLINK-12098 branch April 11, 2019 07:22

HuangZhenQiu pushed a commit to HuangZhenQiu/flink that referenced this pull request Apr 22, 2019

[FLINK-12098][table-planner-blink] Add support for generating optimiz…

ff6c512

…ed logical plan for simple group aggregate on stream (apache#8110)

sunhaibotb pushed a commit to sunhaibotb/flink that referenced this pull request May 8, 2019

[FLINK-12098][table-planner-blink] Add support for generating optimiz…

a80d551

…ed logical plan for simple group aggregate on stream (apache#8110)

tianchen92 pushed a commit to tianchen92/flink that referenced this pull request May 13, 2019

[FLINK-12098][table-planner-blink] Add support for generating optimiz…

882b6ed

…ed logical plan for simple group aggregate on stream (apache#8110)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FLINK-12098] [table-planner-blink] Add support for generating optimized logical plan for simple group aggregate on stream #8110

[FLINK-12098] [table-planner-blink] Add support for generating optimized logical plan for simple group aggregate on stream #8110

godfreyhe commented Apr 3, 2019 •

edited

Loading

flinkbot commented Apr 3, 2019

KurtYoung Apr 4, 2019

godfreyhe Apr 6, 2019

KurtYoung Apr 4, 2019

KurtYoung Apr 4, 2019

KurtYoung Apr 4, 2019

KurtYoung Apr 4, 2019

KurtYoung Apr 8, 2019

godfreyhe Apr 8, 2019

KurtYoung Apr 8, 2019

godfreyhe Apr 8, 2019

KurtYoung Apr 8, 2019

godfreyhe Apr 8, 2019 •

edited

Loading

KurtYoung commented Apr 11, 2019

[FLINK-12098] [table-planner-blink] Add support for generating optimized logical plan for simple group aggregate on stream #8110

[FLINK-12098] [table-planner-blink] Add support for generating optimized logical plan for simple group aggregate on stream #8110

Conversation

godfreyhe commented Apr 3, 2019 • edited Loading

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

flinkbot commented Apr 3, 2019

Review Progress

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

godfreyhe Apr 8, 2019 • edited Loading

Choose a reason for hiding this comment

KurtYoung commented Apr 11, 2019

godfreyhe commented Apr 3, 2019 •

edited

Loading

godfreyhe Apr 8, 2019 •

edited

Loading