Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: experimental runtime bloom pruning #15382

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

dantengsky
Copy link
Member

@dantengsky dantengsky commented Apr 30, 2024

I hereby agree to the terms of the CLA available at: https://docs.databend.com/dev/policies/cla/

Summary

Implements runtime pruning for probe-side data blocks by utilizing the runtime filter (based on the min-max filter) and the bloom filter index of the probe table.

  • replace range filter expression with eq filter expressions if min equals max while constructing the min-max filters

    the eq filter expression is compatible with both rangeindex and bloom index

  • during runtime filtering (of probe side data), if runtime min-max pruning failed, the bloom filter will be tried.

  • add new profile metric RuntimeBloomFilterPrunedParts, which records the number of blocks pruned by bloom filter

  • Fixes #[Link the issue here]

Tests

  • Unit Test
  • Logic Test
  • Benchmark Test
  • No Test - Explain why

Type of change

  • Bug Fix (non-breaking change which fixes an issue)
  • New Feature (non-breaking change which adds functionality)
  • Breaking Change (fix or feature that could cause existing functionality not to work as expected)
  • Documentation Update
  • Refactoring
  • Performance Improvement
  • Other (please describe):

This change is Reviewable

@dantengsky dantengsky changed the title Feat: experimental runtime bloom filter feat: experimental runtime bloom filter Apr 30, 2024
@github-actions github-actions bot added the pr-feature this PR introduces a new feature to the codebase label Apr 30, 2024
@dantengsky dantengsky force-pushed the feat-rt-bloom branch 2 times, most recently from fb634ab to b924e8a Compare April 30, 2024 06:49
@dantengsky dantengsky added the ci-benchmark Benchmark: run all test label May 1, 2024
Copy link
Contributor

github-actions bot commented May 1, 2024

Docker Image for PR

  • tag: pr-15382-6ac94f2

note: this image tag is only available for internal use,
please check the internal doc for more details.

@dantengsky dantengsky changed the title feat: experimental runtime bloom filter feat: experimental runtime bloom pruning May 6, 2024
@dantengsky dantengsky force-pushed the feat-rt-bloom branch 5 times, most recently from 34cefb6 to e9d6d67 Compare May 6, 2024 08:59
@dantengsky dantengsky added ci-benchmark Benchmark: run all test and removed ci-benchmark Benchmark: run all test labels May 6, 2024
Copy link
Contributor

github-actions bot commented May 6, 2024

Docker Image for PR

  • tag: pr-15382-3037a5f

note: this image tag is only available for internal use,
please check the internal doc for more details.

@xudong963 xudong963 self-requested a review May 8, 2024 10:42
@dantengsky
Copy link
Member Author

@xudong963 Thanks for helping me review this PR; really appreciate it. Let me try to make further adjustments to avoid using the bloom filter in situations where false positives could nearly make bloom pruning ineffective.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-benchmark Benchmark: run all test pr-feature this PR introduces a new feature to the codebase
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants