Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prune casted partition columns #22116

Open
Tracked by #22114
sopel39 opened this issue May 24, 2024 · 0 comments
Open
Tracked by #22114

Prune casted partition columns #22116

sopel39 opened this issue May 24, 2024 · 0 comments
Assignees
Labels
subquery-cache Label for subquery cache relates issues

Comments

@sopel39
Copy link
Member

sopel39 commented May 24, 2024

Some tables have partition keys based on strings. Then users might write queries like:

WHERE CAST(part_col AS DATE) > DATE '1992-01-01'

TupleDomains are derived from such predicates. However, connectors don’t really consume CAST(part_col AS DATE) > DATE '1992-01-01' predicate and such predicate remains in ScanFilterProject as remaining predicate.

This significantly reduces cache hit ratio as part_col predicates are often unique.

Hence, in order to improve subquery cache for such cases we need to:

  1. Introduce new method like: TupleDomain<ColumnHandle> ConnectorPageSourceProvider#getSplitPredicate, which would return TupleDomain<ColumnHandle> that would describe split. Such method can actually be used internally by prunePredicate so it feels like it fits in place. This method in particular will return partition_col value as a Domain#singleValue

  2. Enhance CommonPlanAdaptation.PlanSignatureWithPredicate to also contain predicates that couldn’t be translated into TupleDomain but touch scan columns, e.g: CAST(part_col AS DATE) > DATE '1992-01-01'

  3. Enhance CacheDriverFactory to use ExpressionInterpreter on non-TupleDomain predicates with NullableValues from getSplitPredicate. This way such predicates could be simplified.

  4. Put simplified non-TupleDomain predicates into CacheSplitId

@sopel39 sopel39 added the subquery-cache Label for subquery cache relates issues label May 24, 2024
@sopel39 sopel39 self-assigned this Jun 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
subquery-cache Label for subquery cache relates issues
Development

No branches or pull requests

1 participant