Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use attachValues in SQL #11259

Merged
merged 7 commits into from
Apr 5, 2020
Merged

Conversation

reuvenlax
Copy link
Contributor

@reuvenlax reuvenlax commented Mar 29, 2020

Use the more efficient attachValues when building rows in SQL. addValues does a deep copy of the parameter, while addValue does not. We have also recently made API improvements that makes addValues slightly less efficient, so this allows SQL to bypass that.

We are not using attachValues on any of the IOs. This is strictly for internal rows generated by SQL.

@reuvenlax reuvenlax requested a review from apilloud March 29, 2020 23:24
@reuvenlax
Copy link
Contributor Author

run sql postcommit

@alexvanboxel
Copy link
Contributor

alexvanboxel commented Mar 30, 2020

Use the more efficient attachValues when building rows in SQL. addValues does a deep copy of the parameter, while addValue does not. We have also recently made API improvements that makes addValues slightly less efficient, so this allows SQL to bypass that.

We are not using attachValues on any of the IOs. This is strictly for internal rows generated by SQL.

I've changed the description from while attachValues does not too while addValue does not. (typo)

Copy link
Member

@apilloud apilloud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this change the validation in addValue would ensure output types were correct, so I'm not sure that our tests explicitly check for that failure case. Can you verify the tests still catch regressions where incorrect types are produced by SQL?

LGTM

@apilloud
Copy link
Member

Actually it looks like this completely broke BeamCalcRel. You should fix that too.

@reuvenlax
Copy link
Contributor Author

run sql postcommit

@apilloud
Copy link
Member

apilloud commented Apr 1, 2020

FYI, I was going through our internal test failures and found a bunch that will probably be much more difficult to debug after this change. I included a sample of the current failures below, it would be nice to have a way to turn SchemaVerification on for testing.

Caused by: java.lang.IllegalArgumentException: FieldType{typeName=INT64, nullable=false, logicalType=null, collectionElementType=null, mapKeyType=null, mapValueType=null, rowSchema=null, metadata={}} is not nullable in Array field $arrayx
	at org.apache.beam.sdk.values.SchemaVerification.verifyArray(SchemaVerification.java:101)
	at org.apache.beam.sdk.values.SchemaVerification.verifyFieldValue(SchemaVerification.java:65)
	at org.apache.beam.sdk.values.SchemaVerification.verifyRowValues(SchemaVerification.java:57)
	at org.apache.beam.sdk.values.Row$Builder.build(Row.java:673)
	at org.apache.beam.sdk.extensions.sql.zetasql.BeamZetaSqlCalcRel$CalcFn.processElement(BeamZetaSqlCalcRel.java:195)

@reuvenlax
Copy link
Contributor Author

@apilloud I added a pipeline option that can be used to enable more verbose Row checking. Dataflow can enable this on its internal tests.

@apilloud
Copy link
Member

apilloud commented Apr 2, 2020

LGTM

@TheNeuralBit
Copy link
Member

Run SQL PostCommit

Copy link
Member

@TheNeuralBit TheNeuralBit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's concerning that none of the checks caught this error. Is SQL PreCommit not building this code? Hopefully SQL PostCommit catches it

Row row =
verifyRowValues
? Row.withSchema(outputSchema).addValues(fieldValues).build()
: Row.withSchema(outputSchema).attachValues(fieldValues);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is missing a .build()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pr/10883 changed attachValues() to return a Row, so build() not needed.

Row outputRow =
verifyRowValues
? Row.withSchema(outputSchema).addValues(values).build()
: Row.withSchema(outputSchema).attachValues(values);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Row row =
verifyValues
? Row.withSchema(schema).addValues(objects).build()
: Row.withSchema(schema).attachValues(objects);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@TheNeuralBit
Copy link
Member

PostCommit doesn't get it either..

Copy link
Member

@TheNeuralBit TheNeuralBit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nevermind, the tests are right and I am wrong. This API changed in #10883

@reuvenlax reuvenlax merged commit ed5a358 into apache:master Apr 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants