[FLINK-15487][table] Update code generation for new type inference #10960

twalthr · 2020-01-29T14:09:40Z

What is the purpose of the change

This updates the code generation for the new type inference and thus completes FLINK-15487. Scala function work with the types supported by the planner. Tests added in this PR only test basic behavior. We will need more tests per data type. But this is a follow up issue.

Brief change log

Update the code generation
Fix a couple of bugs and shortcomings

Verifying this change

FunctionITCase tests the implementation.

Does this pull request potentially affect one of the following parts:

Dependencies (does it add or upgrade a dependency): no
The public API, i.e., is any changed class annotated with @Public(Evolving): yes
The serializers: no
The runtime per-record code paths (performance sensitive): no
Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Yarn/Mesos, ZooKeeper: no
The S3 file system connector: no

Documentation

Does this pull request introduce a new feature? yes
If yes, how is the feature documented? JavaDocs

flinkbot · 2020-01-29T14:12:48Z

Thanks a lot for your contribution to the Apache Flink project. I'm the @flinkbot. I help the community
to review your pull request. We will use this comment to track the progress of the review.

Automated Checks

Last check on commit ef2e385 (Wed Jan 29 14:12:48 UTC 2020)

Warnings:

No documentation files were touched! Remember to keep the Flink docs up to date!

_{Mention the bot in a comment to re-run the automated checks.}

Review Progress

❓ 1. The [description] looks good.
❓ 2. There is [consensus] that the contribution should go into to Flink.
❓ 3. Needs [attention] from.
❓ 4. The change fits into the overall [architecture].
❓ 5. Overall code [quality] is good.

Please see the Pull Request Review Guide for a full explanation of the review process.

The Bot is tracking the review progress through labels. Labels are applied according to the order of the review items. For consensus, approval by a Flink committer of PMC member is required

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot approve description to approve one or more aspects (aspects: description, consensus, architecture and quality)
@flinkbot approve all to approve all aspects
@flinkbot approve-until architecture to approve everything until architecture
@flinkbot attention @username1 [@username2 ..] to require somebody's attention
@flinkbot disapprove architecture to remove an approval you gave earlier

twalthr · 2020-01-29T14:13:38Z

CC @JingsongLi

flinkbot · 2020-01-29T14:28:24Z

CI report:

ef2e385 Travis: SUCCESS Azure: FAILURE
54efb59 Travis: CANCELED Azure: FAILURE
007a150 Travis: SUCCESS Azure: FAILURE

Bot commands

The @flinkbot bot supports the following commands:

@flinkbot run travis re-run the last Travis build
@flinkbot run azure re-run the last Azure build

JingsongLi

Thanks @twalthr , left some comments.

JingsongLi · 2020-01-31T08:46:08Z

...table-planner-blink/src/main/scala/org/apache/flink/table/planner/codegen/CodeGenUtils.scala

+ /**
+ * Returns a term for representing the given class in Java code.
+ */
+ def typeTerm(clazz: Class[_]): String = {


How about just getCanonicalName?

Canonical name can be null. Class.getName always returns a value.

Hmm, I am also in favor of @JingsongLi suggestion. As per the javadoc of getCanonicalName I think if it gives us null, we cannot use the term anyway.

We could add a check here that throws exception that given class does not have a canonical representation and thus we cannot use it for code generation.

I will update it but in general if we want to relax the UDF constraints in the future, we might need to support anonymous classes in our utilities as well.

JingsongLi · 2020-01-31T09:09:08Z

...table-planner-blink/src/main/scala/org/apache/flink/table/planner/codegen/CodeGenUtils.scala

+ }
+ // extract null term from result term
+ if (sourceType.getConversionClass.isPrimitive) {
+ generateNonNullField(sourceType.getLogicalType, internalResultTerm)


replace sourceType.getLogicalType to reused field fromDataTypeToLogicalType(sourceType)?

JingsongLi · 2020-01-31T09:19:59Z

...table-planner-blink/src/main/scala/org/apache/flink/table/planner/codegen/CodeGenUtils.scala

+ : GeneratedExpression = {
+ // convert external source type to internal format
+ val internalResultTerm = if (isInternalClass(sourceType)) {
+ s"(${boxedTypeTermForType(fromDataTypeToLogicalType(sourceType))}) $externalTerm"


We need this cast because UDF can return Object and DataType with internal conversion class...
Sometimes there is some gap between the conversion class and real java function return class. It is safe to add this cast, and I think JVM can optimize this cast to not affect performance.

I think we can add comment here.

I added a comment. But actually the gap should be filled by the converters. I think the core problem is that int and Integer are handled the same in the converters. Even though the latter can support null und needs unboxing.

The cast is actually not necessary anymore because of the cast in BridgingSqlFunctionCallGen. I remove it for now.

JingsongLi · 2020-01-31T09:27:47Z

...table-planner-blink/src/main/scala/org/apache/flink/table/planner/codegen/CodeGenUtils.scala

 } else {
- val eTerm = boxedTypeTermForExternalType(t)
+ val iTerm = boxedTypeTermForType(fromDataTypeToLogicalType(targetType))
+ val eTerm = boxedTypeTermForExternalType(targetType)


If you don't mind, consider a hotfix to modify the boxedTypeTermForExternalType, because getConversionClass can't be null now.

JingsongLi · 2020-01-31T09:34:02Z

...able-planner-blink/src/main/scala/org/apache/flink/table/planner/codegen/GenerateUtils.scala

 * @return internal unboxed field representation
 */
 def generateInputFieldUnboxing(
 ctx: CodeGeneratorContext,
 fieldType: LogicalType,
- fieldTerm: String): GeneratedExpression = {
+ fieldTerm: String,


fieldTerm -> inputTerm?

I think the method should be refactored in the future. In Flink planner, this method was intended to perform genToInternalIfNeeded. Now the concepts are mixed up.

JingsongLi · 2020-01-31T09:34:34Z

...able-planner-blink/src/main/scala/org/apache/flink/table/planner/codegen/GenerateUtils.scala

 * @return internal unboxed field representation
 */
 def generateInputFieldUnboxing(
 ctx: CodeGeneratorContext,
 fieldType: LogicalType,
- fieldTerm: String): GeneratedExpression = {
+ fieldTerm: String,
+ unboxingTerm: String)


unboxingTerm -> outputTerm or outputUnboxingTerm?

JingsongLi · 2020-01-31T09:50:03Z

...src/main/scala/org/apache/flink/table/planner/codegen/calls/BridgingSqlFunctionCallGen.scala

+ val externalResultCasting = if (externalResultClass == externalResultClassBoxed) {
+ s"($externalResultTypeTerm)"
+ } else {
+ s"($externalResultTypeTerm) (${typeTerm(externalResultClassBoxed)})"


Can we check the method return class too? If it return primitive class, we don't need add this cast.

We cannot determine the return class of the method at this point. The JVM will pick the right method to call with the given signature. The JVM should be smart enough to remove the cast here.

JingsongLi · 2020-02-01T03:30:49Z

...table-planner-blink/src/main/scala/org/apache/flink/table/planner/codegen/CodeGenUtils.scala

+ }
+ // extract null term from result term
+ if (sourceClass.isPrimitive) {
+ generateNonNullField(sourceType, internalResultTerm)


I have some concerns that the user has returned primitive conversion class, but the real java method return non-primitive which could be null.
What kind of errors will we give users at this time? Maybe we can add a test?

Users will get a null pointer exception. Which is expected if people override the default type inference and implement advanced features. Usually, this exception should not happen as people will use the extraction + annotations.

dawidwys

Really impressive work. The usage of ScalarFunctions looks splendid!

dawidwys · 2020-01-31T14:53:14Z

...nk-table-common/src/main/java/org/apache/flink/table/types/extraction/DataTypeExtractor.java

@@ -305,8 +305,12 @@ private DataType extractDataTypeOrError(DataTypeTemplate template, List<Type> ty
 DataTypeTemplate template,
 List<Type> typeHierarchy,
 Type type) {
+ // byte arrays have higher priority than regular arrays


nit: How about a comment like: prefer VARBINARY/BYTES() over ARRAY(TINYINT) for bytes[] instead?

dawidwys · 2020-01-31T15:08:47Z

...on/src/main/java/org/apache/flink/table/types/extraction/utils/FunctionMappingExtractor.java

 } else {
- return type;
+ return FunctionArgumentTemplate.of(type);


nit: two separate methods?

// check for input group before start extracting a data type return tryExtractInputGroup(method, i) // extract a concrete data type .orElseGet(() -> extractUsingExtractor(typeFactory, function, method, i));

Thanks, I also fixed two other bugs on the way.

dawidwys · 2020-01-31T15:21:41Z

...table-planner-blink/src/main/scala/org/apache/flink/table/planner/codegen/CodeGenUtils.scala

+ /**
+ * Returns a term for representing the given class in Java code.
+ */
+ def typeTerm(clazz: Class[_]): String = {


Hmm, I am also in favor of @JingsongLi suggestion. As per the javadoc of getCanonicalName I think if it gives us null, we cannot use the term anyway.

We could add a check here that throws exception that given class does not have a canonical representation and thus we cannot use it for code generation.

dawidwys · 2020-01-31T15:22:28Z

...table-planner-blink/src/main/scala/org/apache/flink/table/planner/codegen/CodeGenUtils.scala

- } else {
- t.getConversionClass.getCanonicalName
- }
+ t.getConversionClass.getCanonicalName


Use typeTerm(t.getConversionClass)

BTW, why is the method called boxed...? We don't perform any boxing here.

Seems to be legacy. I removed it entirely.

dawidwys · 2020-01-31T15:33:46Z

...table-planner-blink/src/main/scala/org/apache/flink/table/planner/codegen/CodeGenUtils.scala

- val iTerm = boxedTypeTermForType(fromDataTypeToLogicalType(t))
- if (isConverterIdentity(t)) {
- s"($iTerm) $term"
+ def genToExternal(


nit: For future reference. I feel that we can unify the genToExternal/Internal methods.

I have the same feeling. But this should be a follow up task. Currently, it is still used at a couple of places.

dawidwys · 2020-02-02T16:56:51Z

...able-planner-blink/src/main/scala/org/apache/flink/table/planner/codegen/GenerateUtils.scala

- fieldTerm: String): GeneratedExpression = {
+ inputType: LogicalType,
+ inputTerm: String,
+ inputUnboxingTerm: String)


I don't fully understand this change, but do we really need that parameter? Shouldn't we only ever check for null and assign value from the same input?

With the old logic we were performing .toInternal() conversion twice. One time for the null check and one time for the assignment. This improves the runtime code.

dawidwys · 2020-02-02T17:08:28Z

...er-blink/src/test/java/org/apache/flink/table/planner/runtime/stream/sql/FunctionITCase.java

+ @Override
+ public TypeInference getTypeInference(DataTypeFactory typeFactory) {
+ return TypeInference.newBuilder()
+ .outputTypeStrategy(TypeStrategies.argument(0))


Shall we maybe change the inputTypeStrategy to be also required instead of a WILDCARD? I think we should not assume anything for users that end up defining their own type inference strategies.

Input validation is rather optional. We are mostly interested in the return type which is why this is the only mandatory attribute.

dawidwys · 2020-02-02T17:19:42Z

...er-blink/src/test/java/org/apache/flink/table/planner/runtime/stream/sql/FunctionITCase.java

+ e,
+ hasMessage(
+ equalTo(
+ "Could not find an implementation method that matches the following " +


nit: Shall we print the class name of the udf to which the function resolved to and available eval methods?

I think users would appreciate that.

twalthr · 2020-02-04T15:41:06Z

Thanks for the review @JingsongLi and @dawidwys. I hope I could address most feedback. I will merge this once Travis gives green light. I will open a follow-up issue for more extensive tests for all data types.

twalthr · 2020-02-04T15:41:37Z

@flinkbot run travis

JingsongLi

Thanks @twalthr , code generation looks good to me.

[hotfix][table-common] Fix invalid BYTES data type extraction

05c3e00

rmetzger added the review=description? label Jan 29, 2020

dawidwys self-assigned this Jan 29, 2020

rmetzger added component=TableSQL/Planner component=TableSQL/API labels Jan 29, 2020

JingsongLi reviewed Jan 31, 2020

View reviewed changes

JingsongLi reviewed Feb 1, 2020

View reviewed changes

dawidwys approved these changes Feb 2, 2020

View reviewed changes

twalthr force-pushed the FLINK-15487_3 branch from 54efb59 to 1a399c8 Compare February 4, 2020 15:34

twalthr added 6 commits February 4, 2020 16:37

fixup

33b981f

[hotfix][table] Fix various type inference issues

5c980ee

fixup

f3e8209

[hotfix][table-common] Add 'use argument' type strategy

6979432

[FLINK-15487][table] Update code generation for new type inference

86a1882

Feedback addressed

007a150

twalthr force-pushed the FLINK-15487_3 branch from 1a399c8 to 007a150 Compare February 4, 2020 15:37

JingsongLi approved these changes Feb 5, 2020

View reviewed changes

twalthr closed this in 5e6e851 Feb 5, 2020

[FLINK-15487][table] Update code generation for new type inference #10960

[FLINK-15487][table] Update code generation for new type inference #10960

Conversation

twalthr commented Jan 29, 2020

What is the purpose of the change

Brief change log

Verifying this change

Does this pull request potentially affect one of the following parts:

Documentation

flinkbot commented Jan 29, 2020

Automated Checks

Review Progress

twalthr commented Jan 29, 2020

flinkbot commented Jan 29, 2020 • edited Loading

CI report:

JingsongLi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

twalthr Jan 31, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dawidwys left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

twalthr commented Feb 4, 2020

twalthr commented Feb 4, 2020

JingsongLi left a comment

Choose a reason for hiding this comment

flinkbot commented Jan 29, 2020 •

edited

Loading

twalthr Jan 31, 2020 •

edited

Loading