Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for the memtable improvement #2804

Closed
31 of 38 tasks
evenyag opened this issue Nov 23, 2023 · 4 comments
Closed
31 of 38 tasks

Tracking issue for the memtable improvement #2804

evenyag opened this issue Nov 23, 2023 · 4 comments
Assignees
Labels
C-enhancement Category Enhancements C-performance Category Performance tracking-issue A tracking issue for a feature.
Milestone

Comments

@evenyag
Copy link
Contributor

evenyag commented Nov 23, 2023

What type of enhancement is this?

Performance

What does the enhancement do?

The default memtable implementation has the following areas for improvement:

  • The active buffer doesn't have capacity limit, we might freeze it once its size reaches a threshold
  • compact() and to_batch() is costly
    • Most time series data has monotonically increasing timestamps so we can avoid sorting if data is already sorted
  • At each query, we need to scan each Series to see whether it matches the filter
    • We can use prefix scan if the filter matches a prefix of the primary key
    • In-memory index in the memtable?
  • Dropping lots of Series is costly, we should explore the possibility of reusing Series
  • The basic overhead of ValueBuilder is relative high when each time-series has only a few data points.

Implementation challenges

We might implement some benchmarks and optimize the memtable based on test results.

Implementation history

Non-Blocking

  • more query tests
  • Better freeze strategy
@evenyag evenyag added C-enhancement Category Enhancements C-performance Category Performance labels Nov 23, 2023
@killme2008
Copy link
Contributor

What's the progress of this issue? Could we convert it to a tracking issue and split tasks? @evenyag @v0y4g3r

@evenyag
Copy link
Contributor Author

evenyag commented Jan 2, 2024

We need some experiments and benchmarks to examine some ideas.

@evenyag evenyag changed the title Time series memtable improvement Tracking issue for time series memtable improvement Jan 2, 2024
@evenyag evenyag added the tracking-issue A tracking issue for a feature. label Jan 2, 2024
@fengjiachun fengjiachun added this to the v0.7 milestone Jan 2, 2024
@evenyag evenyag changed the title Tracking issue for time series memtable improvement Tracking issue for the memtable improvement Jan 17, 2024
@evenyag
Copy link
Contributor Author

evenyag commented Mar 7, 2024

There are some scan performance regressions of the new memtable. The fetch_next_partition is costly under some circumstances.

greptime_merge_tree_read_stage_elapsed_bucket{stage="fetch_next_partition",le="0.005"} 0
greptime_merge_tree_read_stage_elapsed_bucket{stage="fetch_next_partition",le="0.01"} 0
greptime_merge_tree_read_stage_elapsed_bucket{stage="fetch_next_partition",le="0.05"} 0
greptime_merge_tree_read_stage_elapsed_bucket{stage="fetch_next_partition",le="0.1"} 0
greptime_merge_tree_read_stage_elapsed_bucket{stage="fetch_next_partition",le="0.5"} 0
greptime_merge_tree_read_stage_elapsed_bucket{stage="fetch_next_partition",le="1"} 0
greptime_merge_tree_read_stage_elapsed_bucket{stage="fetch_next_partition",le="5"} 10
greptime_merge_tree_read_stage_elapsed_bucket{stage="fetch_next_partition",le="10"} 10
greptime_merge_tree_read_stage_elapsed_bucket{stage="fetch_next_partition",le="60"} 10
greptime_merge_tree_read_stage_elapsed_bucket{stage="fetch_next_partition",le="+Inf"} 10
greptime_merge_tree_read_stage_elapsed_sum{stage="fetch_next_partition"} 20.881826560000004

Related to #3467

@evenyag
Copy link
Contributor Author

evenyag commented Mar 26, 2024

I'm going to close this issue as the performance is tracked in #3467

@evenyag evenyag closed this as completed Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category Enhancements C-performance Category Performance tracking-issue A tracking issue for a feature.
Projects
Status: Done
Development

No branches or pull requests

5 participants