Implement dynamic row filtering #22411

raunaqmorarka · 2024-06-18T07:15:27Z

Description

Dynamic row filtering performs fine-grained filtering of rows in the scan operator,
thus greatly improving performance of some queries.
So far dynamic filters have been pushed into connectors which have used them for
partition, bucket, split and row-group/stripe pruning. This change adds evaluation of
dynamic filters in the engine on worker nodes after the usual static filter (if any) has been
evaluated in ScanFilterProject.
Non-selective dynamic filters are automatically detected and removed from execution
so that overhead of execution these filters is low when they are not useful.

Additional context and related issues

Fixes #13305

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
(x) Release notes are required, with the following suggested text:

# General
* Improve performance of queries with selective joins by performing fine-grained filtering of rows using dynamic filters.
  This optimization is enabled by default and can be disabled using `enable-dynamic-row-filtering` configuration property or `enable_dynamic_row_filtering` session property. ({issue}`22411`)

core/trino-main/src/main/java/io/trino/operator/project/PageProcessorMetrics.java

dain

I'm concerned about adding yet more system properties and config for this. I'm fine having a kill switch if the feature has problems, but I'd make them hidden, and generally we should remove them when we think the feature is working.

core/trino-main/src/main/java/io/trino/sql/gen/columnar/CallColumnarFilterGenerator.java

core/trino-main/src/main/java/io/trino/sql/gen/columnar/InColumnarFilterGenerator.java

raunaqmorarka · 2024-06-28T02:19:36Z

Dynamic row filtering parquet unpartitioned sf1k.pdf

Dynamic row filtering parquet partitioned sf1k.pdf

raunaqmorarka · 2024-06-28T02:49:51Z

I'm concerned about adding yet more system properties and config for this. I'm fine having a kill switch if the feature has problems, but I'd make them hidden, and generally we should remove them when we think the feature is working.

Since this is a new implementation, we need the properties to allow us to easily root cause any potential issues. These also make it easy for us to write tests for the feature. The selectivity threshold is something that a user might want to legitimately tune for their workload. Making them hidden just hinders their usage, I would like to keep them as normal properties for now as there isn't anything harmful about them.

core/trino-main/src/main/java/io/trino/sql/planner/DomainTranslator.java

core/trino-main/src/main/java/io/trino/sql/ir/optimizer/rule/SimplifyContinuousInValues.java

core/trino-main/src/test/java/io/trino/sql/ir/optimizer/TestSimplifyContinuousInValues.java

core/trino-main/src/main/java/io/trino/sql/gen/columnar/FilterEvaluator.java

core/trino-main/src/main/java/io/trino/sql/gen/columnar/DynamicPageFilter.java

core/trino-main/src/main/java/io/trino/sql/gen/ExpressionCompiler.java

core/trino-main/src/main/java/io/trino/sql/gen/columnar/DynamicPageFilter.java

core/trino-main/src/test/java/io/trino/sql/ir/optimizer/TestSimplifyContinuousInValues.java

So far dynamic filters have been pushed into connectors which have used them to filter data at the level of granularity supported by them (e.g. partition, bucket, file, split, row-group etc.). This change adds evaluation of dynamic filters in the engine on worker nodes after the usual static filter (if any) has been evaluated in ScanFilterProject.

Non-selective dynamic filters are automatically detected and removed from execution so that overhead of execution these filters is low when they are not useful.

BenchmarkInCodeGenerator columnarEvaluationEnabled (hitRate) (inListCount) (type) Mode Cnt Before Score After Score Units 0.1 2 bigint avgt 12 9.638 ? 0.265 9.138 ? 0.709 us/op 0.1 4 bigint avgt 12 10.549 ? 0.682 8.410 ? 0.060 us/op 0.1 25 bigint avgt 12 30.833 ? 4.390 8.967 ? 0.346 us/op 0.1 100 bigint avgt 12 33.023 ? 5.527 8.691 ? 0.328 us/op 0.1 1000 bigint avgt 12 34.606 ? 6.841 8.438 ? 0.097 us/op 0.1 10000 bigint avgt 12 32.668 ? 4.724 8.450 ? 0.121 us/op

yx-keith · 2024-07-29T05:39:43Z

Dynamic row filtering parquet unpartitioned sf1k.pdf
[Dynamic row filtering parquet partitioned sf1k.pdf](https://github.com/user-attachments/files/16023339/Dynamic.row.filtering.parquet.partitioned.sf1k.pdf)

how much tpcds data?

raunaqmorarka · 2024-07-29T05:47:32Z

how much tpcds data?

It's scale factor 1000 (1 TB)

cla-bot bot added the cla-signed label Jun 18, 2024

github-actions bot added the hive Hive connector label Jun 18, 2024

raunaqmorarka requested review from sopel39, Dith3r, martint and dain June 18, 2024 07:15

raunaqmorarka mentioned this pull request Jun 18, 2024

Implement dynamic row filtering in hive, iceberg and delta #22175

Closed

raunaqmorarka force-pushed the drf branch from 346e089 to 1096d71 Compare June 18, 2024 07:37

raunaqmorarka added the performance label Jun 18, 2024

raunaqmorarka force-pushed the drf branch 4 times, most recently from 1c99838 to ad6581a Compare June 19, 2024 07:21

Dith3r reviewed Jun 19, 2024

View reviewed changes

core/trino-main/src/main/java/io/trino/operator/project/PageProcessorMetrics.java Show resolved Hide resolved

raunaqmorarka force-pushed the drf branch from ad6581a to bf38362 Compare June 25, 2024 10:22

raunaqmorarka marked this pull request as ready for review June 25, 2024 10:42

dain approved these changes Jun 26, 2024

View reviewed changes

core/trino-main/src/main/java/io/trino/sql/gen/columnar/CallColumnarFilterGenerator.java Show resolved Hide resolved

core/trino-main/src/main/java/io/trino/sql/gen/columnar/InColumnarFilterGenerator.java Outdated Show resolved Hide resolved

Dith3r approved these changes Jun 26, 2024

View reviewed changes

raunaqmorarka force-pushed the drf branch 3 times, most recently from eb9e858 to 876dd37 Compare June 26, 2024 14:30

raunaqmorarka requested a review from dain June 26, 2024 15:56

raunaqmorarka force-pushed the drf branch 4 times, most recently from f0cba12 to 13fd7b5 Compare June 27, 2024 05:24

raunaqmorarka force-pushed the drf branch from 13fd7b5 to da552ba Compare June 28, 2024 11:32

martint reviewed Jul 1, 2024

View reviewed changes

raunaqmorarka force-pushed the drf branch from b74603a to 1230de2 Compare July 2, 2024 11:48

raunaqmorarka requested a review from martint July 2, 2024 11:50

sopel39 reviewed Jul 2, 2024

View reviewed changes

core/trino-main/src/main/java/io/trino/sql/gen/columnar/FilterEvaluator.java Show resolved Hide resolved

sopel39 reviewed Jul 2, 2024

View reviewed changes

core/trino-main/src/main/java/io/trino/sql/gen/columnar/DynamicPageFilter.java Show resolved Hide resolved

sopel39 reviewed Jul 2, 2024

View reviewed changes

core/trino-main/src/main/java/io/trino/sql/gen/columnar/DynamicPageFilter.java Show resolved Hide resolved

sopel39 reviewed Jul 5, 2024

View reviewed changes

core/trino-main/src/main/java/io/trino/sql/gen/columnar/DynamicPageFilter.java Show resolved Hide resolved

raunaqmorarka force-pushed the drf branch 2 times, most recently from fffd5f8 to c86e067 Compare July 8, 2024 16:32

martint reviewed Jul 9, 2024

View reviewed changes

core/trino-main/src/test/java/io/trino/sql/ir/optimizer/TestSimplifyContinuousInValues.java Outdated Show resolved Hide resolved

raunaqmorarka added 9 commits July 10, 2024 23:15

Remove BenchmarkFileFormatsUtils

6b8002b

Support nullable return functions in columnar filter evaluation

f2c872f

Support true/false filters in columnar filter evaluation

15f55c1

Drop ineffective dynamic row filters based on selectivity

2f69e92

Non-selective dynamic filters are automatically detected and removed from execution so that overhead of execution these filters is low when they are not useful.

Simplify FunctionManager#createTestingFunctionManager

19a0309

Implement ShortDecimalType#getRange/getPreviousValue/getNextValue

26e7d87

Implement TimeType#getRange/getPreviousValue/getNextValue

661ee0f

raunaqmorarka force-pushed the drf branch from c86e067 to 49d280b Compare July 10, 2024 20:36

github-actions bot added the iceberg Iceberg connector label Jul 10, 2024

raunaqmorarka requested a review from martint July 10, 2024 20:38

martint approved these changes Jul 10, 2024

View reviewed changes

raunaqmorarka merged commit 73a5581 into trinodb:master Jul 11, 2024
102 checks passed

raunaqmorarka deleted the drf branch July 11, 2024 04:08

github-actions bot added this to the 452 milestone Jul 11, 2024

raunaqmorarka mentioned this pull request Jul 11, 2024

[WIP] Dynamic row filtering #5204

Closed

colebow mentioned this pull request Jul 11, 2024

Add Trino 452 release notes #22573

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement dynamic row filtering #22411

Implement dynamic row filtering #22411

raunaqmorarka commented Jun 18, 2024 •

edited

Loading

dain left a comment

raunaqmorarka commented Jun 28, 2024

raunaqmorarka commented Jun 28, 2024

yx-keith commented Jul 29, 2024

raunaqmorarka commented Jul 29, 2024

Implement dynamic row filtering #22411

Implement dynamic row filtering #22411

Conversation

raunaqmorarka commented Jun 18, 2024 • edited Loading

Description

Additional context and related issues

Release notes

dain left a comment

Choose a reason for hiding this comment

raunaqmorarka commented Jun 28, 2024

raunaqmorarka commented Jun 28, 2024

yx-keith commented Jul 29, 2024

raunaqmorarka commented Jul 29, 2024

raunaqmorarka commented Jun 18, 2024 •

edited

Loading