Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add logical planer and select caching #79

Merged
merged 7 commits into from
Oct 17, 2022

Conversation

fpetkovski
Copy link
Collaborator

@fpetkovski fpetkovski commented Oct 12, 2022

This commit adds the first version of a logical planner with the
capability to unify matchers between different selectors.

The MergeSelectsOptimizer traverses the AST and identifies the most
selective matcher for each individual metric. It then replaces
less selective matchers with the most selective one, and adds an additional
filters to ensure correctness.

The physical plan can then cache results for identical selectors which leads
to fewer network calls and faster series retrieval operations.

@fpetkovski fpetkovski force-pushed the logical-plan branch 2 times, most recently from c0ae261 to fac1a24 Compare October 12, 2022 13:22
@fpetkovski fpetkovski changed the title Logical plan Add logical planer and select caching Oct 12, 2022
@fpetkovski fpetkovski marked this pull request as ready for review October 12, 2022 13:22
@fpetkovski fpetkovski force-pushed the logical-plan branch 4 times, most recently from 86ad32f to 5bbf3a8 Compare October 12, 2022 13:50
p.selectors[key] = newSeriesSelector(p.queryable, mint, maxt, lookbackDelta, matchers)
}

return NewFilteredSelector(p.selectors[key], NewFilter(filters))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can also cache filtered selectors.

logicalplan/sort_matchers.go Outdated Show resolved Hide resolved
logicalplan/merge_selects.go Outdated Show resolved Hide resolved
logicalplan/merge_selects.go Outdated Show resolved Hide resolved
return r
}

// matcherHeap is a set of the most selective label matchers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does it mean by selective here? Fewer postings?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most selective means the one that selects the most amount of postings. The idea is to select the highest amount of postings we can in a single select, and then filter down in series in selectors that don't need all postings. Let me know how we can change the wording here to make it more clear.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can add something like selectiveness means how many series are matched i.e. typically the minimal number of matchers?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I modified the comment a bit.

replacement, found := selectors.findReplacement(l.Name, e.LabelMatchers)
if found {
// All replacements are done on metrics only,
// so we can drop the explicit metric name selector.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All replacements are done on metrics only Cannot get this one. Did you mean all replacements are done on labels other than the metric name?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what I wanted to say here is that we only replace selectors that match for metric names. Something like sum({a="b"}) will not be processed with this optimizer. Any suggestions how we can clarify this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposed alternative.

logicalplan/plan_test.go Show resolved Hide resolved
physicalplan/plan.go Outdated Show resolved Hide resolved
}

func hashMatchers(matchers []*labels.Matcher, mint time.Time, maxt time.Time, delta time.Duration) uint64 {
sb := xxhash.New()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a pool would be nice

@fpetkovski fpetkovski force-pushed the logical-plan branch 2 times, most recently from 0fe34e5 to 47b2130 Compare October 14, 2022 11:05
Copy link
Contributor

@yeya24 yeya24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. We need to resolve conflicts.
Btw any benchmark for the optimization we've done? Where did you get the idea of this optimization?

Is thanos-io/thanos#4407 fixed after we have the selector caching? Seems we cache selectors only, not data.

Copy link
Member

@saswatamcode saswatamcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the awesome work! This already looks super impressive. 🚀
Would love to see some benchmarks too! 💪🏻

Some questions/suggestions! 🙂

RunOptimizers([]Optimizer) parser.Expr
}

type Optimizer interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe renaming to RuleBasedOptimizer would be good here? In case we choose to implement plan cost estimation in the future? 🙂

Suggested change
type Optimizer interface {
type RuleBasedOptimizer interface {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's maybe introduce such distinctions when we have different optimizer types?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏽

}

type Optimizer interface {
Optimize(parser.Expr) parser.Expr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we should also add Explain() methods here for sake of debuggability?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I'd love to add this, but maybe in a follow up PR? I am concerned this change might just keep growing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine to add later

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'd be such a killer feature to have EXPLAIN for PromQL indeed. Seeing how the physical storage is going to be queried. 💯

"github.com/prometheus/prometheus/promql/parser"
)

type FilteredSelector struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this seems to change PromQL spec and adds a new filter() function to it. I'm wondering how safe this is for example filter is implemented upstream, in which case it may do something completely different and we end up with diverging specs.

Copy link
Collaborator Author

@fpetkovski fpetkovski Oct 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, we are adding a new AST node type, similar to how StepInvariant was added upstream: https://github.com/prometheus/prometheus/blob/96d5a32659f0e3928c10a771e50123fead9828bd/promql/parser/ast.go#L178-L183

However, this is not changing the actual spec of PromQL. It only changes the parsed AST. In the best case scenario we would have our own structs copied from the AST and work with them. I thought that might be an overkill for now so I decided to start with something simpler.

If something like this was added to upsteram PromQL, I would expect some tests to fail somewhere :) I am not sure if there is a better way to prevent such incompatibilities, and I don't also see how exactly they would manifest.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack! I see! This makes sense, thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have to extend AST if we want to reuse it and avoid transforming types too many times.

"github.com/prometheus/prometheus/promql/parser"
)

// MergeSelectsOptimizer optimizes a binary expression where
Copy link
Member

@saswatamcode saswatamcode Oct 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So it feels like this optimizer is more of a cache-based optimization rather than a rule-based one? I think we also implement this caching in the physical plan with new storage and extend PromQL to support it by having a separate filter() for it which is added by an optimizer rule in the logical plan. It feels like this not only "rewrites the query", but also implements something new.

We definitely should do this selector merge rule + engine cache, but I wonder if filter is the right place to implement the logical plan rule.
This is new to me, so feel free to correct me if I'm getting this wrong somewhere! 🙂

Also, what happens to regex selectors in this case?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's true, we are adding two optimizations in this case. The physical plan can cache series calls, and logical optimizers make sure that the cache-hit ratio is optimized.

I think adding optimizers to the physical plan would not be straightforward, since the phyisical plan is essentially a set of instantiated operators. We would need to figure out how to transform and rewire them.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regex selectors should also be matched. It's just that we only match selectors if they are exactly the same. As a further optimization, we can extend the detection logic to know that metric{a="foo"} can be replaced with metric{a=~"f.+"}.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it simple -> optimizer changing logical plan

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimizers should only optimize the logical plan indeed. The physical plan is then constructed from that optimized logical plan and can make certain assumptions while being constructed.

@fpetkovski
Copy link
Collaborator Author

I added a benchmark for a binary query like

sum(http_requests_total{code="200"}) / sum(http_requests_total)

Here are the results:

goos: darwin
goarch: arm64
pkg: github.com/thanos-community/promql-engine/engine
BenchmarkMergeSelectorsOptimizer
BenchmarkMergeSelectorsOptimizer/withoutOptimizers
BenchmarkMergeSelectorsOptimizer/withoutOptimizers-8         	      37	  31050148 ns/op	44220461 B/op	  275174 allocs/op
BenchmarkMergeSelectorsOptimizer/withOptimizers
BenchmarkMergeSelectorsOptimizer/withOptimizers-8            	      40	  27853710 ns/op	40639897 B/op	  225706 allocs/op
PASS

I would expect the optimization to work even better in Thanos where we fetch data over the network. With a local TSDB, I believe we can reference the same series without copying them in both the nominator and denominator. In Thanos, both selectors would hold copies of the data which would increase memory usage even more, compared to not having the optimization.

I also added an explicit flag to enable/disable optimizers. Should this be set to true or false by default?

@fpetkovski
Copy link
Collaborator Author

@yeya24 yes thanos-io/thanos#4407 should be fixed by this. We don't cache decoded samples, this is hard because we decode chunks in time-based steps. But we do cache fetched series, which means encoded chunks will be cached. Maybe as a next improvement, we can figure out how to only decode shared chunks once and reuse them in all operators that need them.

@yeya24
Copy link
Contributor

yeya24 commented Oct 17, 2022

I am wondering if we really need a flag. In which case we want to disable it?

@fpetkovski
Copy link
Collaborator Author

Ok, let's go with that. I will change the flag to DisableOptimizers. The use case I see is for debugging purposes if some queries return wrong results.

Copy link
Member

@saswatamcode saswatamcode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! ❤️

@GiedriusS
Copy link
Member

GiedriusS commented Oct 17, 2022

Ok, let's go with that. I will change the flag to DisableOptimizers. The use case I see is for debugging purposes if some queries return wrong results.

Maybe this new option could also cover #66?

{
name: "common selectors",
expr: `sum(metric{a="b", c="d"}) / sum(metric{a="b"})`,
expected: `sum(filter([a="b" c="d"], metric{a="b"})) / sum(metric{a="b"})`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to keep the old matchers in the filter when the vector selector has them? For example, I'd expect to see

			expected: `sum(filter([c="d"], metric{a="b"})) / sum(metric{a="b"})`,

Here? 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point - I guess we do double selection here which is not a harm - we can optimize this later (:

Copy link
Collaborator Author

@fpetkovski fpetkovski Oct 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True, it would be safe and better to strip selectors from filters. It will make also make filtering faster. I can follow up with a PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ended up being fairly straightforward, so I implemented it in this PR.

Copy link
Member

@bwplotka bwplotka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work! Small nits and LGTM - can't wait to feed real data to optimizers for optimization decisions 😱

(e.g imagine logical plan would knew series for next few steps)

}

type Opts struct {
promql.EngineOpts

// DisableOptimizers disables query optimizations using logicalPlan.DefaultOptimizers.
DisableOptimizers bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My worry is that this has to be extended to have different optimization modes, possible with https://yourbasic.org/golang/bitmask-flag-set-clear/

I guess we can break compatibility still and change it in future if needed.

engine/engine.go Outdated
plan, err := physicalplan.New(expr, q, ts, ts, 0, e.lookbackDelta)
logicalPlan := logicalplan.New(expr, ts, ts)
if e.enableOptimizers {
logicalPlan = logicalPlan.RunOptimizers(logicalplan.DefaultOptimizers)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
logicalPlan = logicalPlan.RunOptimizers(logicalplan.DefaultOptimizers)
logicalPlan = logicalPlan.Optimize(logicalplan.DefaultOptimizers)

test, err := promql.NewTest(t, tc.load)
testutil.Ok(t, err)
defer test.Close()
for _, withOptimizers := range disableOptimizers {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for _, withOptimizers := range disableOptimizers {
for _, withoutOptimizers := range disableOptimizers {

"github.com/prometheus/prometheus/promql/parser"
)

type FilteredSelector struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we have to extend AST if we want to reuse it and avoid transforming types too many times.

"github.com/prometheus/prometheus/promql/parser"
)

// MergeSelectsOptimizer optimizes a binary expression where
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep it simple -> optimizer changing logical plan

// and apply an additional filter for {c="d"}.
type MergeSelectsOptimizer struct{}

func (m MergeSelectsOptimizer) Optimize(expr parser.Expr) parser.Expr {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really get why this merge selector can work only on __name__? Why we have to treat metric name specially here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's done mainly to constraint the problem and make it easier to solve. We can generalize it in subsequent iterations, I mainly wanted to prototype something simple which, and make it safe to use and match most cases.

// and apply an additional filter for {c="d"}.
type MergeSelectsOptimizer struct{}

func (m MergeSelectsOptimizer) Optimize(expr parser.Expr) parser.Expr {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optimize(expr parser.Expr) parser.Expr {

We need to invest in good documentation to make it easy to optimize with just this interface (:

RunOptimizers([]Optimizer) parser.Expr
}

type Optimizer interface {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍🏽

}

type Optimizer interface {
Optimize(parser.Expr) parser.Expr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fine to add later

{
name: "common selectors",
expr: `sum(metric{a="b", c="d"}) / sum(metric{a="b"})`,
expected: `sum(filter([a="b" c="d"], metric{a="b"})) / sum(metric{a="b"})`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point - I guess we do double selection here which is not a harm - we can optimize this later (:

Copy link

@metalmatze metalmatze left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really awesome stuff. Lots of good discussions!

This commit adds the first version of a logical planner with the
capability to unify matchers between different selectors.

The MergeSelectsOptimizer traverses the AST and identifies the most
selective matcher for each individual metric. It then replaces
less selective matchers with the most selective one, and adds an additional
filters to ensure correctness.

The physical plan can then cache results for identical selectors which leads
to fewer network calls and faster series retrieval operations.
@fpetkovski
Copy link
Collaborator Author

Thanks everyone for the comments, they should all be resolved now. Let's merge this and start playing with it.

@fpetkovski fpetkovski merged commit 67a4593 into thanos-io:main Oct 17, 2022
@bwplotka
Copy link
Member

💪🏽

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants