Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Never precalculate stats for LIMIT pushdown #22471

Closed
wants to merge 1 commit into from

Conversation

mwd410
Copy link
Contributor

@mwd410 mwd410 commented Jun 21, 2024

Description

This removes the ability to precalc stats for a LIMIT pushdown. I am unaware of a situation where the stats would be useful for a simple LIMIT pushdown. We could obviously make this specific to sql server only if requested, but I don't see value in this regardless.

Additional context and related issues

This came out of a complaint from a starburst customer. In SQL Server, because the statistics are gathered on a per-column basis, this causes SELECT * FROM foo LIMIT 3 to be much, much slower than SELECT * FROM foo. I attempted to improve the performance of that procedure using dm_db_stats_histogram, but this proved to not have all the same information. The only other remediation would be to not precalculate stats for LIMIT pushdown on sql server specifically, but i don't believe a simple limited table scan could possibly benefit from stats, hence this PR to remove it entirely.

Release notes

(X) This is not user-visible or is docs only, and no release notes are required.
( ) Release notes are required. Please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Jun 21, 2024
@mwd410 mwd410 requested a review from findepi June 21, 2024 15:42
@github-actions github-actions bot added iceberg Iceberg connector mongodb MongoDB connector labels Jun 21, 2024
@mwd410 mwd410 changed the title Never precalculate stats in default JDBC for applyLimit Never precalculate stats for LIMIT pushdown Jun 21, 2024
@mwd410 mwd410 requested a review from alexjo2144 June 21, 2024 17:46
@findepi
Copy link
Member

findepi commented Jun 24, 2024

This removes the ability to precalc stats for a LIMIT pushdown. I am unaware of a situation where the stats would be useful for a simple LIMIT pushdown.

Thats would be useful when LIMIT is part of bigger query, where there will be stats-based decisions later on.
I admit it's probably not the first query shape that comes to mind when thinking about a query with LIMIT.

Given that the engine has the full knowledge of the query shape, why not let the engine ignore the request to precalc stats when the engine knows the stats won't be used later on?
Or, equivalently, let the precalc be lazy.

@hashhar hashhar requested a review from SemionPar June 24, 2024 13:11
@mwd410
Copy link
Contributor Author

mwd410 commented Jun 25, 2024

Thats would be useful when LIMIT is part of bigger query, where there will be stats-based decisions later on.

@findepi I assumed that if we pushed down both a filter and a limit, and only the filter wanted to precalc stats, that we still would use those stats - is that incorrect?

Given that the engine has the full knowledge of the query shape, why not let the engine ignore the request to precalc stats when the engine knows the stats won't be used later on?

That sounds like a reasonable plan. I'll see if I can figure that out.

@findinpath findinpath requested a review from wendigo June 25, 2024 14:09
@mwd410 mwd410 closed this Jul 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed iceberg Iceberg connector mongodb MongoDB connector
Development

Successfully merging this pull request may close these issues.

None yet

2 participants