Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement incremental refresh for single-table, predicate-only MVs #20959

Merged
merged 6 commits into from
Jun 20, 2024

Conversation

marton-bod
Copy link
Contributor

@marton-bod marton-bod commented Mar 6, 2024

Description

In the Iceberg connector, currently we do a full refresh of MVs in all cases, meaning we drop all the existing data from the MV storage table and then insert all the data again.

When possible, we could be smarter during refresh. While incrementally refreshing a MV can be tricky for many operation types such as join, aggregations or window functions, there are some easier wins such as single-table, predicate-only queries. More complicated cases can be tackled in subsequent implementation phases.

In this first phase of implementation, this PR introduces incremental refresh with the following limitations:

  • no joins
  • no group by
  • no windowing
  • no select-distinct (e.g. select distinct department from ...)
  • no select function calls (e.g. select sum(price), avg(price) from ...)
  • table contains appends only between fromSnapshot (time of last refresh) and currentSnapshot

(Basically nothing that would require touching/rewriting the already persisted data in the MV.)

For example, a type of MV supported for incremental refresh in this phase:

CREATE MATERIALIZED VIEW catalog.schema.mv AS
SELECT name, CASE location 
  WHEN 'UK' THEN 'United Kingdom' 
  WHEN 'US' THEN 'United States'
  ...
FROM catalog.schema.base_table 
WHERE id IN ('8e374aa5a5d8d3720458fe9c5363c35200', 'd08963f57c94e41fb7c95cd801dd85b200', ...)
AND last_active_date > DATE '2024-02-01'

Approach:

  1. During refresh time, LogicalPlanner checks the plan tree and suggests either incremental or full refresh to the connector
  2. In the connector, IcebergMetadata#beginRefreshMaterializedView() takes this information from the engine, checks a few additional things (e.g. presence of fromSnapshot for all base tables) and makes the final decision
  3. When IcebergSplitManager creates the table scan, it would either create a TableScan (full refresh), or an IncrementalAppendScan (incremental refresh), using the fromSnapshot value. It can still revert back to full refresh in case the fromSnapshot is already expired or the refresh window contains any deletes/overwrites.
  4. If it's an incremental refresh, we skip the storage table truncation during IcebergMetadata#finishRefreshMaterializedView() and perform an append-only commit.

Additional context and related issues

Solves issue partially: #18673

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
() Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# Iceberg

* Support for incremental refresh for basic materialized views within Iceberg connector. ({issue}`209591`)

@cla-bot cla-bot bot added the cla-signed label Mar 6, 2024
@github-actions github-actions bot added the iceberg Iceberg connector label Mar 6, 2024
@marton-bod marton-bod changed the title rdar:https://123628905 Implement incremental refresh for single-table, predicate-only MVs Implement incremental refresh for single-table, predicate-only MVs Mar 6, 2024
@marton-bod marton-bod force-pushed the incr_ref branch 2 times, most recently from 8ff865b to 53e7cc1 Compare March 7, 2024 11:14
@findepi
Copy link
Member

findepi commented Mar 7, 2024

@sopel39 @alexjo2144 please take a look

@marton-bod
Copy link
Contributor Author

@findepi @sopel39 @alexjo2144 Hi team, could you please take a quick look at this PR? Thank you very much!

Copy link

github-actions bot commented May 3, 2024

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

@github-actions github-actions bot added the stale label May 3, 2024
@marton-bod
Copy link
Contributor Author

@martint as discussed in our zoom call, I've refactored this PR so that the analyzer passes the information to the connector to make the incremental/full refresh decision, as opposed to using table properties. Please take a look and let me know what you think. Thank you!

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments

@@ -86,6 +86,189 @@ protected String getStorageMetadataLocation(String materializedViewName)
return table.getParameters().get(METADATA_LOCATION_PROP);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also add a test with update/delete. Part of scenario would be that incremental refresh should work again after compaction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a new test case which verifies that we fall back to FULL refresh for cases when delete files are present. As for compaction reenabling incremental refresh, please see my comment below

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for compaction reenabling incremental refresh, please see my comment below

Would a test like that work:

  1. initial (full) refresh with some data
  2. add one row to source table
  3. incremental refresh
  4. compaction
  5. full refresh
  6. add one row to source table
  7. we are back to incremental refresh

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is good, I would just modify step 4:

  1. initial (full) refresh with some data
  2. add one row to source table
  3. incremental refresh
  4. update or delete one row
  5. full refresh
  6. add one row to source table
  7. we are back to incremental refresh

Copy link
Contributor Author

@marton-bod marton-bod Jun 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the original test case you proposed, step 5 shouldn't do any refresh (essentially a noop), because step 4 (compaction) didn't add any new rows, it just binpacked the existing rows into fewer files. Nonetheless, that's also something worthwhile to test, so added an extra test to verify this assumption.

Copy link

cla-bot bot commented Jun 5, 2024

Thank you for your pull request and welcome to our community. We could not parse the GitHub identity of the following contributors: Marton Bod.
This is most likely caused by a git client misconfiguration; please make sure to:

  1. check if your git client is configured with an email to sign commits git config --list | grep email
  2. If not, set it up using git config --global user.email [email protected]
  3. Make sure that the git commit email is configured in your GitHub account settings, see https://github.com/settings/emails

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm % comments

@@ -86,6 +86,189 @@ protected String getStorageMetadataLocation(String materializedViewName)
return table.getParameters().get(METADATA_LOCATION_PROP);
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for compaction reenabling incremental refresh, please see my comment below

Would a test like that work:

  1. initial (full) refresh with some data
  2. add one row to source table
  3. incremental refresh
  4. compaction
  5. full refresh
  6. add one row to source table
  7. we are back to incremental refresh

* {@code refreshType} is a signal from the engine to the connector whether the MV refresh could be done incrementally or only fully, based on the plan.
* The connector is not obligated to perform the refresh in the fashion prescribed by {@code refreshType}, this is merely a hint from the engine.
*/
default ConnectorInsertTableHandle beginRefreshMaterializedView(ConnectorSession session, ConnectorTableHandle tableHandle, List<ConnectorTableHandle> sourceTableHandles, RetryMode retryMode, RefreshType refreshType)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martint do you want to ack on SPI change?

@marton-bod marton-bod force-pushed the incr_ref branch 2 times, most recently from 75521bb to e2aef8d Compare June 12, 2024 22:44
When MV refresh is executing, the planner suggests either incremental or full refresh to the connector. In this first phase, incremental refresh is suggested only when Scan/Filter/Project nodes are present in the plan tree - otherwise full refresh. The Iceberg connector can act on this signal and use IncrementalAppendScan to scan the 'delta' records only and append them to the MV storage table (without truncation).
Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good! Couple of remaining comments

package io.trino.spi;

/**
* Signifies whether a Materialized View refresh operation could be done incrementally or only fully.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reference: semantics of this will most likely change when we support full scope of incremental refreshes

@sopel39 sopel39 merged commit 0eb5bfc into trinodb:master Jun 20, 2024
99 checks passed
@github-actions github-actions bot added this to the 451 milestone Jun 20, 2024
@mosabua
Copy link
Member

mosabua commented Jun 26, 2024

@marton-bod are you planning to send a small separate PR to document this?

@marton-bod
Copy link
Contributor Author

@mosabua yes, I'll post a PR with the proposed doc changes. Thanks for bringing it up

cccs-nik added a commit to CybercentreCanada/trino that referenced this pull request Jul 12, 2024
* Add additional test coverage for integer number to varchar coercer

Co-authored-by: Yiqun Zhang <[email protected]>

* Allow integer number to varchar coercion in ORC unpartitioned table

Co-authored-by: Yiqun Zhang <[email protected]>

* Allow integer number to varchar coercion in Parquet unpartitioned table

* Quote current test param when test fails

* Use enhanced switch statement

* Remove redundant explicit generics

* Extract a dedicated Double to Varchar Coercer for ORC

* Allow double to varchar coercion in Parquet unpartitioned table

* Merge Hudi query runners into one

* Add support for case insensitive name cache in BigQuery

Co-authored-by: regadas <[email protected]>

* Fix failure when reading signed timestamp with time zone stats in Delta Lake

* Avoid org.locationtech.jts:jts-core exclusion in Pinot

* Improve 449 release notes

* Add docs for Hive metadata caching config

* Remove exceptions related to older version of Hive

* Simplify BaseTestHiveCoercion

Older version of Hive doesn't support Float/Real type for parquet table format, but Hive 3.0+
doesn't have such restriction.

* Allow float to varchar coercion in hive table

This coercion would work for partitioned tables for all formats and for unpartitioned tables
it would work for ORC and Parquet table format.

* Improve error message for resource groups

* Gracefully handle missing spooling output stats in FTE scheduler

It is rare but possible to get empty spooling output stats for task which completed successfully.
This may happen if we observe FINISHED task state based on received TaskStatus but are later on unable to
successfully retrieve TaskInfo. In such case we are building final TaskInfo based on last known taskInfo, just
updating the taskState field. The spooling output stats will not be present.
As we need this information in FTE mode we need to fail such task artificially

* Add JDBC property to use current catalog in metadata if none provided

Some BI tools don't pass a `catalog` when calling the `DatabaseMetaData`
`getTables`, `getColumns` and `getSchemas` methods. This makes the JDBC
driver search across all catalogs which can be expensive.

This commit introduces a new boolean connection property
`assumeNullCatalogMeansCurrentCatalog` (disabled by default) to be used
with such BI tools. If enabled the driver will try to use current
`catalog` of the JDBC connection when fetching Trino metadata like
`getTables`, `getColumns`, `getSchemas` if the `catalog` argument to
those methods is passed as `null`.

Co-authored-by: Rafał Połomka <[email protected]>
Co-authored-by: Ashhar Hasan <[email protected]>

* Make sealed version of Avro Logical Types supported natively

* Refactor block building decoder creation to connector specific class

* Remove unnecessary exclusions from Pinot

* Remove unnecessary, commented code

* Rewrite JUnit Assertions to AssertJ

Changed automatically by openrewrite recipe
```
<plugin>
    <groupId>org.openrewrite.maven</groupId>
    <artifactId>rewrite-maven-plugin</artifactId>
    <version>5.32.0</version>
    <configuration>
        <activeRecipes>
            <recipe>org.openrewrite.java.testing.assertj.JUnitToAssertj</recipe>
        </activeRecipes>
    </configuration>
    <dependencies>
        <dependency>
            <groupId>org.openrewrite.recipe</groupId>
            <artifactId>rewrite-testing-frameworks</artifactId>
            <version>2.10.1</version>
        </dependency>
    </dependencies>
</plugin>
```

run
```
./mvnw rewrite:run -Dmaven.javadoc.skip=true -DskipTests=true -Dmaven.site.skip=true -Dmaven.artifact.threads=16 -T 1C -e -Dair.check.skip-all=true --no-snapshot-updates -pl '!:trino-product-tests,!:trino-product-tests-launcher,!:trino-server-rpm'
```

Some formatting was fixed manually for readability.

Then, rewrite remaining JUnit Assertions to AssertJ manually, because
some entries were not migrated by `./mvnw rewrite:run`.

* Rewrite TestNG Assertions to AssertJ

Changed automatically by openrewrite recipe
openrewrite/rewrite-testing-frameworks#520
```
<plugin>
    <groupId>org.openrewrite.maven</groupId>
    <artifactId>rewrite-maven-plugin</artifactId>
    <version>5.32.0</version>
    <configuration>
        <activeRecipes>
            <recipe>org.openrewrite.java.testing.assertj.TestNgToAssertj</recipe>
        </activeRecipes>
    </configuration>
    <dependencies>
        <dependency>
            <groupId>org.openrewrite.recipe</groupId>
            <artifactId>rewrite-testing-frameworks</artifactId>
            <version>2.11.0-SNAPSHOT</version>
        </dependency>
    </dependencies>
</plugin>
```

run
```
./mvnw rewrite:run -Dmaven.javadoc.skip=true -DskipTests=true -Dmaven.site.skip=true -Dmaven.artifact.threads=16 -T 1C -e -Dair.check.skip-all=true --no-snapshot-updates -pl '!:trino-server-rpm'
```

Some formatting was fixed manually for readability.

Then, rewrite remaining TestNG Assertions to AssertJ manually, because
some entries were not migrated by `./mvnw rewrite:run`.

* Prefer AssertJ assertions over TestNG, JUnit

* Update airbase to 157

* Update exec-maven-plugin to 3.3.0

* Pass additional type information in OrcColumn

OrcTypeKind captures only data types details while other information like precision
and scale are not a part of these information.

* Fix unsupported reads of decimals as varchar in parquet

* Allow decimal to varchar coercion for unpartitioned tables

* Fix grammar error in error message

* Simplify boolean comparison in getColumnMappingMode()

* Allow char to varchar coercion for hive tables

* Improve docs for HTTP server config

* Verify spark compatibility in native-fs special chars test

* Use Identifier.getValue when analyzing table function arguments

Otherwise, getPartitionBy and getOrderBy methods
in TableArgument SPI returns quoted column names.

* Avoid redundant null check before sizeOf()

* Remove unused method from BigQueryClient

* Convert DeleteFile to record in Iceberg

* Convert IcebergMaterializedViewDefinition to record

* Convert IcebergPartitioningHandle to record

* Convert IcebergWritableTableHandle to record

* Convert TrinoSortField to record in Iceberg

* Convert MemoryInsertTableHandle to record

* Convert MemoryDataFragment to record

* Test Memory connector config class

* Test reading liquid clustering tables in Delta Lake

* Add support for reading UniForm with Iceberg in Delta Lake

* Skip unsupported variant type in Delta Lake

* Reduce number of row groups in tests using small row groups

Avoids creating too many splits when split offsets are populated

* Fix detecting start of row group in parquet reader

* Populate split offsets when writing orc/parquet files in iceberg

Offsets can be used by readers to align splits with row-group/stripe boundaries

* Use timestampColumnMapping in ClickHouse

* Avoid using deprecated ClickHouseContainer class

* Set air.compiler.fail-warnings as true in ClickHouse

* Refactor TestMultipleDistinctAggregationToMarkDistinct

Remove deprecated symbol usage.

* Extract DistinctAggregationStrategyChooser

Extract the logic to determine whether the direct distinct aggregation applicability,
which can be reused in multiple optimiser rules.

* Use maximum NDV symbol to choose distinct aggregation strategy

* Move OptimizeMixedDistinctAggregations

Move class before refactoring to preserve history

* Support for multiple aggregations in OptimizeMixedDistinctAggregations

Adds support for multiple distinct aggregations in OptimizeMixedDistinctAggregations`.

* Replace optimizer.optimize-mixed-distinct-aggregations

Replace optimizer.optimize-mixed-distinct-aggregations with a new
optimizer.distinct-aggregations-strategy `pre_aggregate`. Also rename corresponding config property optimizer.mark-distinct-strategy to
optimizer.distinct-aggregations-strategy and values to NONE -> SINGLE_STEP and ALWAYS -> MARK_DISTINCT

* Enable OptimizeMixedDistinctAggregations automatically

Use estimated aggregation source NDV and the number
of grouping keys to decide if pre-aggregate strategy
should be used for a given aggregation

* Remove unnecessary override method isRemotelyAccessible in IcebergSplit

* Add non-null check in MemoryConnector

* Fix example in test guideline

* Add support for TRUNCATE statement in memory connector

* Remove unsupported version check and usage in Cassandra

* Add support for truncate statement in Iceberg

* Enable TopN for non-textual types for clickhouse-connector

* Prevent OOM when expanding projection over BigQuery no columns scan

* Improve ternary operator formatting in BigQuerySplitManager

* Produce single Big Query split when row count is all that matters

Similar to what we do in other connectors. There is no reason to create
multiple splits just to return row count information.

* Use safe idiom of getting the only element

* Automatically configure BigQuery scan parallelism

* Cleanup in OrcTypeTranslator

Use enhanced switch statements and re-arrange code based on the source data type

* Fix coercion gaps for Hive tables in ORC format

There were a few difference on the coercion supported between partitioned
and un-partitioned table for ORC format.

* Make scheduling of remote accessible splits with addresses more strict

UniformNodeSelector or FTE scheduler will schedule remote accessible splits
on selected nodes if such nodes are available and only fallback to other
nodes if preferred nodes are no longer part of cluster. Connector might have stale
node information when creating splits which could result in choosing offline nodes.
Additionally, in FTE mode nodes can go down so split addresses could no longer
be valid then task is restarted.

Additionally, this commit simplifies UniformNodeSelector optimizedLocalScheduling
which was hard to reason about and was not taking advantages of
recent improvements like adaptive split queue length.

Co-authored-by: Karol Sobczak <[email protected]>

* Enable query and task retries in MariaDbClient

Copy test overrides from 7b80852, as MariaDB also needs them.

* Remove unused function parameter

* Keep separate top-level classes in separate compilation units

It's considered anti-pattern to have multiple non-nested classes in one
.java file. We even have a checker for that, but it was suppressed in
this case.

* Remove redundant cast

* Use anonymous variables

* Cleanup warnings on top-level

`@SuppressWarnings` should be applied selectively, not for the whole
compilation unit.

* Make helper methods static

* Remove redundant throws declaration

* Allow pinning to WebIdentityTokenCredentialsProvider

Allow users to only use the WebIdentityTokenCredentialsProvider instead of the default credentials provider chain.

* Extract STS client getter

* Allow pinning to WebIdentityTokenCredentialsProvider

Allow users to only use the WebIdentityTokenCredentialsProvider instead
of the default credentials provider chain.

* Extract constant to represent partition for non-partitioned Hudi table

Co-Authored-By: Yuya Ebihara <[email protected]>

* Small code fixes

* remove dead code
* fix comment typo

* Improve performance of json functions

Avoid allocating heap ByteBuffer used by InputStreamReader.

* Fix broken testTruncateTable in Iceberg Snowflake catalog

* Explicitly configure executorService for s3 multipartuploads

Previously used forkjoin common pool meant for cpu bound operations

* Increase s3.max-connections from AWS default of 50 to 500

Aligns native S3FileSystem with legacy TrinoS3FileSystem

* Verify body in doScheduleAsyncCleanupRequest

We observed rare cases when we got non-JSON result in
doScheduleAsyncCleanupRequest and current logging does not provide
enough information to understand the root cause.

* Fix uncheckedCacheGet javadoc

The `uncheckedCacheGet` will propagate `ExecutionError` if `Cache.get`
throws it.

This reverts commit 3d0632d.

* Add Iceberg query runner with TASK retries enabled

* Add `exchange.azure.endpoint` configuration option
This option can be used instead of `exchange.azure.connection-string` and will use the default azure credentials when set.

* Fix failure when reading non-primitive Parquet fields in Iceberg

* Skip getting stats on varbinary type in Delta

* Rename tests in TestDeltaLakeParquetStatisticsUtils

* Add support for stats on timestamp type in Delta Lake

* Fix format of `@DefunctConfig` values in Cassandra

* Add support for typeWidening(-preview) reader feature in Delta Lake

* Add test for HudiPlugin

* Run TestPhoenixConnectorTest in parallel

* Nessie: Support APIv2 client

By default, Nessie API version will be inferred from Nessie URI.
If the User has a custom URI which doesn't have version info in
the URI, user can configure `iceberg.nessie-catalog.client-api-version`

* Update netty to 4.1.111.Final

* Update AWS SDK v2 to 2.26.0

* Update nessie to 0.90.2

* Update AWS SDK v1 to 1.12.741

* Update reactor-core to 3.6.7

* Update checker-qual to 3.44.0

* Update flyway to 10.15.0

* Update MongoDB to 5.1.1

* Update lucene-analysis-common to 9.11.0

* Update hudi to 0.15.0

* Update freemarker to 2.3.33

* Fix evictable cache invalidation race condition

Before the change the following race was possible between threads A, B, C:

- A calls invalidate(K)
- A increments: invalidations++
- B changes state to be cached, and therefore calls invalidate(K) too
- B increments: invalidations++
- C calls get(K)
- C reads invalidations counter
- C retrieves current token T for key K
- C reads value V for T from cache
- A reads and removes token T (same) for key K
- B attempts to read and remove token for key K, not found
- B exits invalidate(K)
- C checks invalidations counter (didn't check)
- C revives, i.e. re-inserts token T for key K
- B calls get(K)
- B retrieves token T (same) for key K
- B reads value V for T from cache -- despite B having called
  invalidate(K)

At least in this situation the problem is transient. Thread A will momentarily
invalidate dataCache for token T, completing the invalidation.

This commit fixes this. The bug was introduced by token reviving (commit
17faae3). This commit reverts that one
and provides a different solution to the problem that commit was
solving.

* Pin 3rd party CI action versions

GitHub allows to delete and re-publish a tag, so referencing 3rd party
action by tag name should be discouraged.

* Add encoding to error code in OAuth2 callback handler

* Rearchitect IR optimizer and evaluator

Model it as a collection of independent transformations,
similar to the plan optimizer.

* Distribute comparisons over case and switch

Transforms expressions such as:

    a = case
       when c1 then r1
       when c2 then r2
       else r3
       end

into

    case
       when c1 then a = r1
       when c2 then a = r2
       else a = r3
    end

Additionally, simplifies expressions such as:

    case
       when c then true
       else false
    end

* Enable Dependabot for GitHub Actions

Dependabot will automatically open pull requests to update GitHub
Actions. The main benefit of this is knowing that new versions are
available, especially if third-party GitHub Actions are pinned to
specific versions. Running it weekly should give enough time to review
updates, and also react switfly to potential security updates.

* Avoid abort in testing helper method or for-loop

* Convert ResourceGroupInfo to a record class

* Make sure expression type does not change after processing with rule

* modify alluxio version to stable 2.9.4

* Include detailed error in S3 batch delete message

* Clarify the default value of iceberg.catalog.type in Iceberg docs

* Remove unused argument from getSparkTableStatistics

* Fix description in Redis plugin

* Optimized DATE_TRUNC function

* Update libraries-bom to 26.41.0

* Update clickhouse-jdbc to 0.6.1

* Update nimbus-jose-jwt to 9.40.0

* Update AWS SDK v2 to 2.26.1

* Update GCS connector to hadoop3-2.2.23

* Update AWS SDK v1 to 1.12.742

* Update nessie to 0.90.4

* Update elasticsearch to 7.17.22

* Set hive.metastore.catalog.dir in Iceberg and Delta query runners

* Remove redundant annotation

* Use takari-smart-builder

It's a default builder for mvnd

* Replace air.main.basedir with session.rootDirectory

* Update maven to 3.9.8

This version is aligned with mvnd-1.0.0

* Migrate iceberg caching product test to native filesystem

* Fix argument order in ST_Point() function docs

* Update docker-image version to 97

* Test allowColumnDefaults writer feature in Delta Lake

* Check duplicated field names in Iceberg

* Update jline to 3.26.2

* Update parquet to 1.14.1

* Improve code comment

* Expose driver execution stats via JMX

* Clean up code comments in JdbcJoinPushdownUtil

* Clarify required configuration for fs caching

* Track rate of pending acquires processing in BinPackingNodeAllocatorService

* Add reason to pending acquires counters in BinPackingNodeAllocatorService

* Install plugins in testing server before startup completes

When startup completes (`StartupStatus.startupComplete()`), a worker
will accept tasks (`TaskResource.failRequestIfInvalid`). The plugins
and functions need to be installed before that happens.

* Migrate unsupported writer version to integration test in Delta

* Migrate unsupported writer feature to integration test in Delta

* Remove testTrinoAlterTablePreservesChangeDataFeed

testTrinoAlterTablePreservesTableMetadata verifies
that table configuration is preserved.

* Migrate append-only to integration test in Delta

* Use correct isolation level in writing the commit info for DELETE

* Add Trino 450 release notes

* Document S3 max connections default change

* [maven-release-plugin] prepare release 450

* [maven-release-plugin] prepare for next development iteration

* Make AsyncResponse aware of the client disconnection

With the current Jersey and Jetty implementations @suspended AsyncResponse
can generate a lot of warnings that are caused by the fact that Jetty eagerly recycles
request/response objects when the HTTP/2 RST_STREAM frame is sent by the client during request aborting.

We abort in-flight requests for different reasons: either timeout is reached or when the client is closed during cleanup.

For HTTP/1 that doesn't matter, but for HTTP/2 this will cause the AsyncResponse.resume to log an error,
since the underlying request/response objects are already recycled and there is no client listening for the response.

This change also cancels Future<?> bound to the AsyncContext since there is no point in processing it anymore.

* Retain authorization during client redirects for basic auth

* Create tables under volatile directory in TestDeltaLakeBasic

catalogDir will be deleted after the test execution.

* Fix wrong entry in 450 release note

* Disallow row type without field names in Delta Lake

* Check duplicated field names in Delta Lake

Also, include exception message in DeltaLakeMetadata.addColumn.

* Implement incremental refresh for single-table, predicate-only MVs (trinodb#20959)

Add incremental refresh support for simple Iceberg MVs

When MV refresh is executing, the planner suggests either incremental or full refresh to the connector. In this first phase, incremental refresh is suggested only when Scan/Filter/Project nodes are present in the plan tree - otherwise full refresh is performed. The Iceberg connector can act on this signal and use IncrementalAppendScan to scan the 'delta' records only and append them to the MV storage table (without truncation).

Co-authored-by: Karol Sobczak <[email protected]>

* Add Expression to Result in ExpressionAssertProvider

* Add transform function const

* Add JsonStringArrayExtract function

Optimzie common case of transform(cast(json_parse(varchar_col) as array<json>), json -> json_extract_scalar(json, jsonPath)) case to
be single and more performant call of JsonStringArrayExtract.

* Expose lease distribution stats from BinPackingNodeAllocatorService

* Update airlift to 249

* Fix product tests previously migrated to native fs to not use legacy fs

* Add non-null check in SortingFileWriter

* Allow adding and droping fields to records wrapped in array

* Allow changing field type in records wrapped in an array

* Refactor TestJsonStringArrayExtractScalar

* Use List/Set injection into configuration

This makes configuration more explicit in regard to expected types

* Fix reading empty files when using native S3 FS

* Fix configuration mistake

* Allow reading empty files in native GCS FS

* Workaround for reading empty files using native S3 FS

When reading an empty file from S3 as a stream, using the AWS SDK v2, it
incorrectly keeps reading the MD5 checksum included at the end of the
response. This has been reported upstream as
aws/aws-sdk-java-v2#3538

* Optimize IntegerNumberToDoubleCoercer#applyCoercedValue to avoid repeated reading of Block values

* Add support for prefix configuration in Iceberg REST catalog

REST catalog supports a `prefix` in the resource path as per the spec.
But it is not exposed from Trino `IcebergRestCatalogConfig`.
In Spark, this works by default because they accept generic property bag.

* Rename getColumns to getTopLevelColumns in iceberg connector

* Support partitioning on nested ROW fields in Iceberg

Co-authored-by: Victoria Bukta <[email protected]>
Co-authored-by: Yuya Ebihara <[email protected]>

* Add Parquet Bloom filter write support to Iceberg connector

* List unhealthy containers when product tests health check fails

* Reschedule replacement tasks in case of artificial failure due to missing spooling output size

The code introduced in trinodb#22298 was
lacking rescheduling replacement tasks. As a result query execution got
stuck after we observed missing spooling output stats for task.

* Convert inner classes in ConstantPopulatingPageSource to record

* Convert Iceberg PartitionColumn to record

* Convert IcebergInputInfo to record

* Convert IcebergPageSink.PartitionColumn to record

* Convert Iceberg ReaderPageSourceWithRowPositions to record

* Convert IcebergParquetColumnIOConverter.FieldContext to record

* Convert IcebergPartitionColumn to record

* Convert PartitionTransforms.ColumnTransform to record

* Use field id in freeCurrentRowGroupBuffers for consistency

* Remove interning from ColumnChunkProperties

Avoids having unbounded and unaccounted memory usage.
Also avoids the overheads associated with concurrent map look-up

* Convert ColumnChunkProperties into a record

* Removed unused code from ColumnChunkMetadata

* Optimize ParquetReader#getColumnChunkMetaData

Avoid unnecessary conversion to ColumnPath
Compare arrays in reverse to find mismatch quicker

* Optimize ParquetTypeUtils#getDescriptors

Existing logic was complex due to performing case-insensitive matching.
This was unnecesary because fileSchema and requestedSchema already contain lower cased names.
Also, since requestedSchema is derived from fileSchema, we can build descriptors map directly
from result of getColumns instead of repeating look-ups in fileSchema.

* Update jjwt to 0.12.6

* Update minio to 8.5.11

* Update RoaringBitmap to 1.1.0

* Update AWS SDK v1 to 1.12.749

* Update AWS SDK v2 to 2.26.8

* Update oauth2-oidc-sdk to 11.13

* Update metrics-core to 4.2.26

* Update Nessie to 0.91.2

* Update openlineage to 1.17.1

* Update Azure SDK

Azure SDK version 1.2.24 includes Azure Storage Blob 12.26, which
introduces a breaking change, where blob names used to create a new blob
client should not be URL-encoded anymore. Blob URLs in the storage don't
change, so after adjusting the API usage, data written by older Trino
versions should be readable by new Trino versions.

* Disable caching of /ui responses

This ensures that any sensitive information is not cached in the browser

* Allow adding non-temurin JDK distributions

This is a preparation for testing additional builds like Loom EA

* Add Loom EA build

* Verify ppc64le docker image

* Add clarifying javadoc

* Add more http proxy configs for native S3 filesystem implementation

To make the http proxy config options in parity with legacy implementation

* Push filter through TopNRankingNode

* Fix initial value of diagnostics counter

* Fix handling RESERVED status when updating stats

* Allow uppercase characters in Iceberg quoted identifiers partition

* Add special characters missing test coverage for native-fs s3 path

* Use enhanced instanceof in KuduClientSession

* Fix failure when filter operation is applied on varbinary column in Kudu

* Remove enable-coordinator-dynamic-filters-distribution config

This has been enabled by default for a long time without issues.
It is no longer necessary to keep it as a configurable.

* Migrate Hive view in Delta to integration tests

* Migrate unregistering non-delta table to integration tests

* Throw TrinoException for Kudu failures

Inserting/Merging/Updating invalid data in Kudu doesn't throw a Kudu specific
exception so we wrap all the exceptions as Trino specific exceptions

* Add date data type support for Kudu

Co-authored-by: Pascal Gasp <[email protected]>

* Add t_cdf and t_pdf scalar functions

* Fix spelling errors in documentation

* Fix incorrect case optimization

Expressions of the form:

   CASE
      WHEN <cond> THEN true
      ELSE false
   END

were being simplified to <cond>, which is
incorrect due to null handling. The original
form would return false, while the simplified
form returns null.

* Allow pre-allocating memory per catalog on initialization

* Add Trino 451 release notes

* [maven-release-plugin] prepare release 451

* Merge fixes for Trino 451 (#35)

---------

Co-authored-by: praveenkrishna.d <[email protected]>
Co-authored-by: Yiqun Zhang <[email protected]>
Co-authored-by: Piotr Findeisen <[email protected]>
Co-authored-by: Yuya Ebihara <[email protected]>
Co-authored-by: regadas <[email protected]>
Co-authored-by: Manfred Moser <[email protected]>
Co-authored-by: Colebow <[email protected]>
Co-authored-by: uditkumar <[email protected]>
Co-authored-by: Łukasz Osipiuk <[email protected]>
Co-authored-by: Rafał Połomka <[email protected]>
Co-authored-by: Ashhar Hasan <[email protected]>
Co-authored-by: Jack Klamer <[email protected]>
Co-authored-by: Sasha Sheikin <[email protected]>
Co-authored-by: Mateusz "Serafin" Gajewski <[email protected]>
Co-authored-by: Raunaq Morarka <[email protected]>
Co-authored-by: Anu Sudarsan <[email protected]>
Co-authored-by: takezoe <[email protected]>
Co-authored-by: Kamil Endruszkiewicz <[email protected]>
Co-authored-by: Lukasz Stec <[email protected]>
Co-authored-by: chenjian2664 <[email protected]>
Co-authored-by: Mayank Vadariya <[email protected]>
Co-authored-by: Alexey Pavlenko <[email protected]>
Co-authored-by: Karol Sobczak <[email protected]>
Co-authored-by: Dejan Mircevski <[email protected]>
Co-authored-by: Jan Waś <[email protected]>
Co-authored-by: Grant Nicholas <[email protected]>
Co-authored-by: Keith Whitley <[email protected]>
Co-authored-by: ajantha-bhat <[email protected]>
Co-authored-by: Grzegorz Nowak <[email protected]>
Co-authored-by: Martin Traverso <[email protected]>
Co-authored-by: Jianjian <[email protected]>
Co-authored-by: senlizishi <[email protected]>
Co-authored-by: Alexander Kolesnikov <[email protected]>
Co-authored-by: Gontzal Monasterio <[email protected]>
Co-authored-by: Marius Grama <[email protected]>
Co-authored-by: Star Poon <[email protected]>
Co-authored-by: Marton Bod <[email protected]>
Co-authored-by: Brad <[email protected]>
Co-authored-by: Vikash Kumar <[email protected]>
Co-authored-by: Victoria Bukta <[email protected]>
Co-authored-by: Jonas Irgens Kylling <[email protected]>
Co-authored-by: kasiafi <[email protected]>
Co-authored-by: Pascal Gasp <[email protected]>
Co-authored-by: Emily Sunaryo <[email protected]>
Co-authored-by: neriya shulman <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

None yet

4 participants