-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add logical planer and select caching #79
Conversation
c0ae261
to
fac1a24
Compare
86ad32f
to
5bbf3a8
Compare
p.selectors[key] = newSeriesSelector(p.queryable, mint, maxt, lookbackDelta, matchers) | ||
} | ||
|
||
return NewFilteredSelector(p.selectors[key], NewFilter(filters)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can also cache filtered selectors.
5bbf3a8
to
5cbb92c
Compare
return r | ||
} | ||
|
||
// matcherHeap is a set of the most selective label matchers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does it mean by selective here? Fewer postings?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most selective means the one that selects the most amount of postings. The idea is to select the highest amount of postings we can in a single select, and then filter down in series in selectors that don't need all postings. Let me know how we can change the wording here to make it more clear.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can add something like selectiveness means how many series are matched i.e. typically the minimal number of matchers
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I modified the comment a bit.
logicalplan/merge_selects.go
Outdated
replacement, found := selectors.findReplacement(l.Name, e.LabelMatchers) | ||
if found { | ||
// All replacements are done on metrics only, | ||
// so we can drop the explicit metric name selector. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All replacements are done on metrics only
Cannot get this one. Did you mean all replacements are done on labels other than the metric name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So what I wanted to say here is that we only replace selectors that match for metric names. Something like sum({a="b"}) will not be processed with this optimizer. Any suggestions how we can clarify this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proposed alternative.
} | ||
|
||
func hashMatchers(matchers []*labels.Matcher, mint time.Time, maxt time.Time, delta time.Duration) uint64 { | ||
sb := xxhash.New() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a pool would be nice
0fe34e5
to
47b2130
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. We need to resolve conflicts.
Btw any benchmark for the optimization we've done? Where did you get the idea of this optimization?
Is thanos-io/thanos#4407 fixed after we have the selector caching? Seems we cache selectors only, not data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the awesome work! This already looks super impressive. 🚀
Would love to see some benchmarks too! 💪🏻
Some questions/suggestions! 🙂
RunOptimizers([]Optimizer) parser.Expr | ||
} | ||
|
||
type Optimizer interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe renaming to RuleBasedOptimizer
would be good here? In case we choose to implement plan cost estimation in the future? 🙂
type Optimizer interface { | |
type RuleBasedOptimizer interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's maybe introduce such distinctions when we have different optimizer types?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏽
} | ||
|
||
type Optimizer interface { | ||
Optimize(parser.Expr) parser.Expr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps we should also add Explain()
methods here for sake of debuggability?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I'd love to add this, but maybe in a follow up PR? I am concerned this change might just keep growing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fine to add later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be such a killer feature to have EXPLAIN
for PromQL indeed. Seeing how the physical storage is going to be queried. 💯
"github.com/prometheus/prometheus/promql/parser" | ||
) | ||
|
||
type FilteredSelector struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this seems to change PromQL spec and adds a new filter()
function to it. I'm wondering how safe this is for example filter
is implemented upstream, in which case it may do something completely different and we end up with diverging specs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's true, we are adding a new AST node type, similar to how StepInvariant was added upstream: https://github.com/prometheus/prometheus/blob/96d5a32659f0e3928c10a771e50123fead9828bd/promql/parser/ast.go#L178-L183
However, this is not changing the actual spec of PromQL. It only changes the parsed AST. In the best case scenario we would have our own structs copied from the AST and work with them. I thought that might be an overkill for now so I decided to start with something simpler.
If something like this was added to upsteram PromQL, I would expect some tests to fail somewhere :) I am not sure if there is a better way to prevent such incompatibilities, and I don't also see how exactly they would manifest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack! I see! This makes sense, thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we have to extend AST if we want to reuse it and avoid transforming types too many times.
"github.com/prometheus/prometheus/promql/parser" | ||
) | ||
|
||
// MergeSelectsOptimizer optimizes a binary expression where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So it feels like this optimizer is more of a cache-based optimization rather than a rule-based one? I think we also implement this caching in the physical plan with new storage
and extend PromQL to support it by having a separate filter()
for it which is added by an optimizer rule in the logical plan. It feels like this not only "rewrites the query", but also implements something new.
We definitely should do this selector merge rule + engine cache, but I wonder if filter
is the right place to implement the logical plan rule.
This is new to me, so feel free to correct me if I'm getting this wrong somewhere! 🙂
Also, what happens to regex selectors in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's true, we are adding two optimizations in this case. The physical plan can cache series calls, and logical optimizers make sure that the cache-hit ratio is optimized.
I think adding optimizers to the physical plan would not be straightforward, since the phyisical plan is essentially a set of instantiated operators. We would need to figure out how to transform and rewire them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regex selectors should also be matched. It's just that we only match selectors if they are exactly the same. As a further optimization, we can extend the detection logic to know that metric{a="foo"}
can be replaced with metric{a=~"f.+"}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep it simple -> optimizer changing logical plan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimizers should only optimize the logical plan indeed. The physical plan is then constructed from that optimized logical plan and can make certain assumptions while being constructed.
47b2130
to
9c30b72
Compare
I added a benchmark for a binary query like
Here are the results:
I would expect the optimization to work even better in Thanos where we fetch data over the network. With a local TSDB, I believe we can reference the same series without copying them in both the nominator and denominator. In Thanos, both selectors would hold copies of the data which would increase memory usage even more, compared to not having the optimization. I also added an explicit flag to enable/disable optimizers. Should this be set to true or false by default? |
@yeya24 yes thanos-io/thanos#4407 should be fixed by this. We don't cache decoded samples, this is hard because we decode chunks in time-based steps. But we do cache fetched series, which means encoded chunks will be cached. Maybe as a next improvement, we can figure out how to only decode shared chunks once and reuse them in all operators that need them. |
I am wondering if we really need a flag. In which case we want to disable it? |
Ok, let's go with that. I will change the flag to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! ❤️
Maybe this new option could also cover #66? |
logicalplan/plan_test.go
Outdated
{ | ||
name: "common selectors", | ||
expr: `sum(metric{a="b", c="d"}) / sum(metric{a="b"})`, | ||
expected: `sum(filter([a="b" c="d"], metric{a="b"})) / sum(metric{a="b"})`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to keep the old matchers in the filter when the vector selector has them? For example, I'd expect to see
expected: `sum(filter([c="d"], metric{a="b"})) / sum(metric{a="b"})`,
Here? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point - I guess we do double selection here which is not a harm - we can optimize this later (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, it would be safe and better to strip selectors from filters. It will make also make filtering faster. I can follow up with a PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ended up being fairly straightforward, so I implemented it in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Amazing work! Small nits and LGTM - can't wait to feed real data to optimizers for optimization decisions 😱
(e.g imagine logical plan would knew series for next few steps)
} | ||
|
||
type Opts struct { | ||
promql.EngineOpts | ||
|
||
// DisableOptimizers disables query optimizations using logicalPlan.DefaultOptimizers. | ||
DisableOptimizers bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My worry is that this has to be extended to have different optimization modes, possible with https://yourbasic.org/golang/bitmask-flag-set-clear/
I guess we can break compatibility still and change it in future if needed.
engine/engine.go
Outdated
plan, err := physicalplan.New(expr, q, ts, ts, 0, e.lookbackDelta) | ||
logicalPlan := logicalplan.New(expr, ts, ts) | ||
if e.enableOptimizers { | ||
logicalPlan = logicalPlan.RunOptimizers(logicalplan.DefaultOptimizers) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logicalPlan = logicalPlan.RunOptimizers(logicalplan.DefaultOptimizers) | |
logicalPlan = logicalPlan.Optimize(logicalplan.DefaultOptimizers) |
engine/engine_test.go
Outdated
test, err := promql.NewTest(t, tc.load) | ||
testutil.Ok(t, err) | ||
defer test.Close() | ||
for _, withOptimizers := range disableOptimizers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for _, withOptimizers := range disableOptimizers { | |
for _, withoutOptimizers := range disableOptimizers { |
"github.com/prometheus/prometheus/promql/parser" | ||
) | ||
|
||
type FilteredSelector struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we have to extend AST if we want to reuse it and avoid transforming types too many times.
"github.com/prometheus/prometheus/promql/parser" | ||
) | ||
|
||
// MergeSelectsOptimizer optimizes a binary expression where |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep it simple -> optimizer changing logical plan
// and apply an additional filter for {c="d"}. | ||
type MergeSelectsOptimizer struct{} | ||
|
||
func (m MergeSelectsOptimizer) Optimize(expr parser.Expr) parser.Expr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really get why this merge selector can work only on __name__
? Why we have to treat metric name specially here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's done mainly to constraint the problem and make it easier to solve. We can generalize it in subsequent iterations, I mainly wanted to prototype something simple which, and make it safe to use and match most cases.
// and apply an additional filter for {c="d"}. | ||
type MergeSelectsOptimizer struct{} | ||
|
||
func (m MergeSelectsOptimizer) Optimize(expr parser.Expr) parser.Expr { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optimize(expr parser.Expr) parser.Expr {
We need to invest in good documentation to make it easy to optimize with just this interface (:
RunOptimizers([]Optimizer) parser.Expr | ||
} | ||
|
||
type Optimizer interface { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍🏽
} | ||
|
||
type Optimizer interface { | ||
Optimize(parser.Expr) parser.Expr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fine to add later
logicalplan/plan_test.go
Outdated
{ | ||
name: "common selectors", | ||
expr: `sum(metric{a="b", c="d"}) / sum(metric{a="b"})`, | ||
expected: `sum(filter([a="b" c="d"], metric{a="b"})) / sum(metric{a="b"})`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point - I guess we do double selection here which is not a harm - we can optimize this later (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really awesome stuff. Lots of good discussions!
1774d50
to
c210d0f
Compare
This commit adds the first version of a logical planner with the capability to unify matchers between different selectors. The MergeSelectsOptimizer traverses the AST and identifies the most selective matcher for each individual metric. It then replaces less selective matchers with the most selective one, and adds an additional filters to ensure correctness. The physical plan can then cache results for identical selectors which leads to fewer network calls and faster series retrieval operations.
Co-authored-by: Bartlomiej Plotka <[email protected]>
aba7ccd
to
2fc8e09
Compare
2fc8e09
to
0711911
Compare
Thanks everyone for the comments, they should all be resolved now. Let's merge this and start playing with it. |
💪🏽 |
This commit adds the first version of a logical planner with the
capability to unify matchers between different selectors.
The MergeSelectsOptimizer traverses the AST and identifies the most
selective matcher for each individual metric. It then replaces
less selective matchers with the most selective one, and adds an additional
filters to ensure correctness.
The physical plan can then cache results for identical selectors which leads
to fewer network calls and faster series retrieval operations.