Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve determinism by preserving split order #22475

Merged
merged 1 commit into from
Jul 1, 2024

Conversation

gaurav8297
Copy link
Member

@gaurav8297 gaurav8297 commented Jun 21, 2024

Description

In this we improve query execution determinism by
preserving split order when scheduling
in pipeline execution mode.

This is essentially needed such that splits coming out of CacheSplitSource preserve the order
between coordinator and workers. This way we
prevent scheduling two splits with same
CacheSplitId to reuse cache within a query.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@starburstdata-automation
Copy link

starburstdata-automation commented Jun 21, 2024

Started benchmark workflow for this PR with test type = iceberg/sf10000_parquet_part.

Building Trino finished with status: success
Benchmark finished with status: failure
Status message:

Copy link
Member

@sopel39 sopel39 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's see benchmark results

@starburstdata-automation
Copy link

starburstdata-automation commented Jun 25, 2024

Started benchmark workflow for this PR with test type = hive/sf1000_parquet_part.

Building Trino finished with status: success
Benchmark finished with status: success
Status message: No baseline found.
Benchmark Comparison Report

In this we improve query execution determinism by
preserving split order when scheduling
in pipeline execution mode.

This is essentially needed such that splits coming
out of CacheSplitSource preserve the order
between coordinator and workers. This way we
prevent scheduling two splits with same
CacheSplitId to reuse cache within a query.
@gaurav8297
Copy link
Member Author

CI issue: #18697 (comment)

@gaurav8297 gaurav8297 requested a review from sopel39 June 26, 2024 05:50
@starburstdata-automation
Copy link

starburstdata-automation commented Jun 28, 2024

Started benchmark workflow for this PR with test type = iceberg/sf1000_parquet_part.

Building Trino finished with status: success
Benchmark finished with status: success
Status message: NO Regression found.
Benchmark Comparison Report

@@ -22,6 +22,8 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: commit message references stuff not yet landed

@sopel39 sopel39 merged commit 452c627 into trinodb:master Jul 1, 2024
95 checks passed
@sopel39 sopel39 added the no-release-notes This pull request does not require release notes entry label Jul 1, 2024
@github-actions github-actions bot added this to the 452 milestone Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed no-release-notes This pull request does not require release notes entry
Development

Successfully merging this pull request may close these issues.

None yet

3 participants