Disable JDBC join pushdown #22442

Dith3r · 2024-06-19T13:49:16Z

Description

Disable JDBC join pushdown as it could degrade performance.

Additional context and related issues

Release notes

( ) This is not user-visible or is docs only, and no release notes are required.
(X) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

findepi · 2024-06-19T14:32:50Z

plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcJoinPushdownSupportModule.java

@@ -31,7 +31,5 @@ protected void setup(Binder binder)
 {
 configBinder(binder).bindConfig(JdbcJoinPushdownConfig.class);
 bindSessionPropertiesProvider(binder, JdbcJoinPushdownSessionProperties.class);
-
- configBinder(binder).bindConfigDefaults(JdbcMetadataConfig.class, config -> config.setJoinPushdownEnabled(true));


JdbcJoinPushdownSupportModule is used only by connectors which support cost-based join pushdown.

JdbcMetadataConfig your comment:

Join pushdown is disabled by default as this is the safer option.

And it is enabled by default.

I'm assuming that meant that it was unsafe to enable in all cases but is safe if we do it based on the cost based estimate.
I think the cost based estimate doesn't account for reduction in probe output due to dynamic filtering. It might be possible to estimate this based on NDV of the columns being joined. @Dith3r maybe we can try building that into the cost estimation.
Otherwise, I don't think we should outright disable this as this covers a wide variety of data sources and we can't prove that disabling pushdown is the best strategy overall.

I think the cost based estimate doesn't account for reduction in probe output due to dynamic filtering.

I think we need to consider different join types under assumption that we want to minimize data size fetched from connector

1). Filtering join (e.g. fact table join with dimension table), non-expanding

Filtering join are most common and should always be beneficial to pushdown. Even with perfect DF from Trino, join output will always be smaller than probe and build side

2). Expanding join (multiplies output rows for every probe input row), no filtering

Here DFs are irrelevant (no filtering). Might still be beneficial to pushdown depending on build side size. However, expanding joins are usually hard to estimate, so I would avoid pushing them

3). Expanding join that does some filtering also

Maybe we could estimate % of rows filtered and expanding factor for remaining rows? Generally, I would probably avoid pushdown of expanding joins because of uncertainity of stats. Assuming DFs work, then they should do good job at filtering probe side data so we should only get rows that get expanded. Pushing expanding join to connector might explode data produced by connector.

I'm assuming that meant that it was unsafe to enable in all cases but is safe if we do it based on the cost based estimate.

yes

What I'm essentially saying is that we probably could simplify the cost decision to expanding vs non-expanding join.

I believe that's the current logic as implemented.

trino/plugin/trino-base-jdbc/src/main/java/io/trino/plugin/jdbc/JdbcJoinPushdownConfig.java

Lines 26 to 28 in 095d35a

// Normally we would put 1.0 as a default value here to only allow joins which do not expand data to be pushed down.

// We use 1.25 to adjust for the fact that NDV estimations sometimes are off and joins which should be pushed down are not.

private double joinPushdownAutomaticMaxJoinToTablesRatio = 1.25;

However, by not-pushing join we might prevent further pushdowns

true

Hence it's not entirely connector decision do do or not do pushdown

that would require exploratory optimizer

that would require exploratory optimizer

yes, but not full exploratory optimizer, but it would be sufficient to have bunch of alternatives in source stage cc. @assaf2

For SQL databases, we should only consider push down if a covering index is present and for Postgres if any of the joined tables is not partitioned, as this case underperforms.

For SQL databases, we should only consider push down if a covering index is present and for Postgres if any of the joined tables is not partitioned, as this case underperforms.

I think it's something we should put into documentation (cc @mosabua ) that users should add indexes on columns they plan to use in joins

If this is indeed the case, then first and foremost the join pushdown code should check for necessary indexes being present.

sopel39 · 2024-06-19T14:33:39Z

I think we should investigate more why performance is degraded

mosabua · 2024-06-19T19:55:08Z

I think removal because it could degrade performance is not correct. It could (and does) also improve performance in many cases and as such it should stay on in the base connector. Individual connectors can then potentially override it to either always be disabled, change the default to be disabled or enabled, and also provide a user configurable option.

Once we know when and why performance it degraded we can make a more informed decision how to proceed. Do you have any specific about the problems you are experiencing and caused you to send this PR @Dith3r ?

Dith3r · 2024-06-20T07:41:37Z

PostgreSQL sf10 partitioned with analyzed tables suffers 3x degradation of performance (in some rare queries, degradation is 8-10x). With other PostgreSQL benchmarks, I've not seen any degradation/improvement from this option set to enable. I will try to investigate it further.

hashhar · 2024-07-01T06:30:06Z

@Dith3r Have you checked if the Postgres connector provides stats for partitioned tables at all? IIRC we haven't implemented it with partitioned tables in mind so either the stats are entirely missing or incorrect.

Do you see same issue with unpartitioned tables?

Dith3r · 2024-07-01T11:53:43Z

Unpartitioned tables does not suffer from join push down in ppstgresql. For partitioned tables connector provides estimates and it triggers push down which is not handled well.

Disable JDBC join pushdown

9274158

cla-bot bot added the cla-signed label Jun 19, 2024

Dith3r requested review from losipiuk, raunaqmorarka and sopel39 June 19, 2024 13:49

wendigo requested a review from findepi June 19, 2024 14:24

findepi reviewed Jun 19, 2024

View reviewed changes

Dith3r closed this Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable JDBC join pushdown #22442

Disable JDBC join pushdown #22442

Dith3r commented Jun 19, 2024

findepi Jun 19, 2024

Dith3r Jun 19, 2024

raunaqmorarka Jun 19, 2024

sopel39 Jun 19, 2024 •

edited

Loading

findepi Jun 20, 2024

findepi Jun 20, 2024

sopel39 Jun 20, 2024

Dith3r Jun 27, 2024

sopel39 Jul 1, 2024

findepi Jul 1, 2024

sopel39 commented Jun 19, 2024

mosabua commented Jun 19, 2024

Dith3r commented Jun 20, 2024 •

edited

Loading

hashhar commented Jul 1, 2024

Dith3r commented Jul 1, 2024

	// Normally we would put 1.0 as a default value here to only allow joins which do not expand data to be pushed down.
	// We use 1.25 to adjust for the fact that NDV estimations sometimes are off and joins which should be pushed down are not.
	private double joinPushdownAutomaticMaxJoinToTablesRatio = 1.25;

Disable JDBC join pushdown #22442

Disable JDBC join pushdown #22442

Conversation

Dith3r commented Jun 19, 2024

Description

Additional context and related issues

Release notes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sopel39 Jun 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sopel39 commented Jun 19, 2024

mosabua commented Jun 19, 2024

Dith3r commented Jun 20, 2024 • edited Loading

hashhar commented Jul 1, 2024

Dith3r commented Jul 1, 2024

sopel39 Jun 19, 2024 •

edited

Loading

Dith3r commented Jun 20, 2024 •

edited

Loading