-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add storage api support for reading table for query function in bigquery #22432
Add storage api support for reading table for query function in bigquery #22432
Conversation
I don't think exposing API flag is a good idea. It's an internal implementation users shouldn't care. Why not switch those API automatically based on |
Hi @ebyhr, I tried running query on
In all the cases
|
I am trying to see the possibility to use of |
08f7fc8
to
da08ab5
Compare
Hi @ebyhr @Praveen2112, I have removed the additional parameter approach. Now I am using Storage API internally for all the native query except for the case when there is duplicate column. In terms of performance, for the cases where the resultset output is less the native query takes more time now than existing solution(REST_API). I tried using |
/test-with-secrets sha=da08ab54b8d950b2be3a78ee893833aaa8b58a0f |
The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/9710216273 |
Could you confirm CI failure? It looks related to this change. |
da08ab5
to
9b1cd6e
Compare
/test-with-secrets sha=9b1cd6e6fbfa2244f0bb420e949003b684756b1f |
The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/9710992320 |
Let me remove |
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryQueryRelationHandle.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryClient.java
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQuerySplitManager.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQuerySplitManager.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQuerySplitManager.java
Outdated
Show resolved
Hide resolved
9b1cd6e
to
2c702ae
Compare
Thanks @ebyhr for the review. Addressed comments. |
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryQueryRelationHandle.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQuerySplitManager.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQuerySplitManager.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQuerySplitManager.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQuerySplitManager.java
Outdated
Show resolved
Hide resolved
2c702ae
to
94d6f30
Compare
94d6f30
to
1cd3b48
Compare
@Praveen2112 @ebyhr could you please run this PR with secrets? |
/test-with-secrets sha=1cd3b48a7b3550ecee4e9057330e7ec9637ccf0e |
The CI workflow run with tests that require additional secrets finished as failure: https://github.com/trinodb/trino/actions/runs/9793194786 |
Thanks @ebyhr @Praveen2112 for the review. Addressed comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me except for a logic to build BigQuery SELECT statement in split manager.
List<String> projectedColumnsNames = getProjectedColumnNames(columns); | ||
|
||
String query = filter | ||
.map(whereClause -> "SELECT " + String.join(",", projectedColumnsNames) + " FROM (" + bigQueryQueryRelationHandle.getQuery() + ") WHERE " + whereClause) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
String.join(",", projectedColumnsNames)
This may create a invalid query SELECT FROM (...)
if projectedColumnsNames is an empty right? I'm not sure if such situations really happen though. Can we change to *
or throw an exception?
Also, why is projectedColumnsNames
unused when filter
doesn't exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if such situations really happen though
In this query SELECT count(*) FROM TABLE(bigquery.system.query(query => 'SELECT 1'))
, Projected column names is empty. Thanks for pointing it out. Fixing this. I think I can use createEmptyProjection
in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also using projected column names fails when the query does not have column name provided and gets generated internally SELECT * FROM TABLE(bigquery.system.query(query => 'SELECT 1'))
.
Caused by: com.google.cloud.bigquery.BigQueryException: Unrecognized name: f0_ at [1:8]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also using projected column names fails when the query does not have column name provided and gets generated internally SELECT * FROM TABLE(bigquery.system.query(query => 'SELECT 1')).
For this reason, I am going back to use *
instead of projectedColumnNames
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC: @Praveen2112
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQuerySplitManager.java
Show resolved
Hide resolved
1cd3b48
to
4e1c1f2
Compare
/test-with-secrets sha=4e1c1f2a16112d6297ced970c44de41b4c31b014 |
The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/9805144765 |
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQuerySplitManager.java
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQuerySplitManager.java
Outdated
Show resolved
Hide resolved
plugin/trino-bigquery/src/main/java/io/trino/plugin/bigquery/BigQueryClient.java
Outdated
Show resolved
Hide resolved
4e1c1f2
to
38499b7
Compare
Thanks @ebyhr. Addressed comments. |
/test-with-secrets sha=38499b7253a36d0ee815b0f64a753bcfe8e1b836 |
The CI workflow run with tests that require additional secrets has been started: https://github.com/trinodb/trino/actions/runs/9834807023 |
Description
This PR now uses the storage api to execute native bigquery query.
This PR adds an optional parametermode
for BigQuery query table function, which indicates which api should be used to run the native bigquery query.By default when mode is not provided It will useREST_API
to run the native bigquery query.Parametermode
can be set as1.REST_API
(More Details: https://cloud.google.com/bigquery/docs/reference/rest)2.STORAGE_API
(More Details: https://cloud.google.com/bigquery/docs/reference/storage).Additional context and related issues
While implementing this feature, we see that when native query result set is small then the
REST_API
performs better thanSTORAGE_API
. But when native query result set is large then theSTORAGE_API
performs better thanREST_API
.~
Here are some numbers:
Using Storage API has an overhead of creating a temporary cached table. This approach is similar to how the Materialized view is implemented in Bigquery.
Release notes
(X) Release notes are required, with the following suggested text: