Add logical planer and select caching #79

fpetkovski · 2022-10-12T13:07:33Z

This commit adds the first version of a logical planner with the
capability to unify matchers between different selectors.

The MergeSelectsOptimizer traverses the AST and identifies the most
selective matcher for each individual metric. It then replaces
less selective matchers with the most selective one, and adds an additional
filters to ensure correctness.

The physical plan can then cache results for identical selectors which leads
to fewer network calls and faster series retrieval operations.

logicalplan/merge_selects.go

fpetkovski · 2022-10-12T15:40:27Z

physicalplan/storage/pool.go

+		p.selectors[key] = newSeriesSelector(p.queryable, mint, maxt, lookbackDelta, matchers)
+	}
+
+	return NewFilteredSelector(p.selectors[key], NewFilter(filters))


Maybe we can also cache filtered selectors.

logicalplan/sort_matchers.go

logicalplan/merge_selects.go

yeya24 · 2022-10-14T00:04:11Z

logicalplan/merge_selects.go

+	return r
+}
+
+// matcherHeap is a set of the most selective label matchers


What does it mean by selective here? Fewer postings?

Most selective means the one that selects the most amount of postings. The idea is to select the highest amount of postings we can in a single select, and then filter down in series in selectors that don't need all postings. Let me know how we can change the wording here to make it more clear.

Maybe we can add something like selectiveness means how many series are matched i.e. typically the minimal number of matchers?

Thanks, I modified the comment a bit.

yeya24 · 2022-10-14T00:05:35Z

logicalplan/merge_selects.go

+				replacement, found := selectors.findReplacement(l.Name, e.LabelMatchers)
+				if found {
+					// All replacements are done on metrics only,
+					// so we can drop the explicit metric name selector.


All replacements are done on metrics only Cannot get this one. Did you mean all replacements are done on labels other than the metric name?

So what I wanted to say here is that we only replace selectors that match for metric names. Something like sum({a="b"}) will not be processed with this optimizer. Any suggestions how we can clarify this?

Proposed alternative.

logicalplan/plan_test.go

physicalplan/plan.go

yeya24 · 2022-10-14T00:09:40Z

physicalplan/storage/pool.go

+}
+
+func hashMatchers(matchers []*labels.Matcher, mint time.Time, maxt time.Time, delta time.Duration) uint64 {
+	sb := xxhash.New()


Having a pool would be nice

yeya24

LGTM. We need to resolve conflicts.
Btw any benchmark for the optimization we've done? Where did you get the idea of this optimization?

Is thanos-io/thanos#4407 fixed after we have the selector caching? Seems we cache selectors only, not data.

saswatamcode

Thanks for the awesome work! This already looks super impressive. 🚀
Would love to see some benchmarks too! 💪🏻

Some questions/suggestions! 🙂

saswatamcode · 2022-10-16T17:00:48Z

logicalplan/plan.go

+	RunOptimizers([]Optimizer) parser.Expr
+}
+
+type Optimizer interface {


Maybe renaming to RuleBasedOptimizer would be good here? In case we choose to implement plan cost estimation in the future? 🙂

Suggested change

type Optimizer interface {

type RuleBasedOptimizer interface {

Let's maybe introduce such distinctions when we have different optimizer types?

saswatamcode · 2022-10-16T17:02:22Z

logicalplan/plan.go

+}
+
+type Optimizer interface {
+	Optimize(parser.Expr) parser.Expr


Perhaps we should also add Explain() methods here for sake of debuggability?

Yeah I'd love to add this, but maybe in a follow up PR? I am concerned this change might just keep growing.

fine to add later

It'd be such a killer feature to have EXPLAIN for PromQL indeed. Seeing how the physical storage is going to be queried. 💯

saswatamcode · 2022-10-16T17:14:00Z

logicalplan/filter.go

+	"github.com/prometheus/prometheus/promql/parser"
+)
+
+type FilteredSelector struct {


So this seems to change PromQL spec and adds a new filter() function to it. I'm wondering how safe this is for example filter is implemented upstream, in which case it may do something completely different and we end up with diverging specs.

That's true, we are adding a new AST node type, similar to how StepInvariant was added upstream: https://github.com/prometheus/prometheus/blob/96d5a32659f0e3928c10a771e50123fead9828bd/promql/parser/ast.go#L178-L183

However, this is not changing the actual spec of PromQL. It only changes the parsed AST. In the best case scenario we would have our own structs copied from the AST and work with them. I thought that might be an overkill for now so I decided to start with something simpler.

If something like this was added to upsteram PromQL, I would expect some tests to fail somewhere :) I am not sure if there is a better way to prevent such incompatibilities, and I don't also see how exactly they would manifest.

Ack! I see! This makes sense, thanks!

Yes, we have to extend AST if we want to reuse it and avoid transforming types too many times.

saswatamcode · 2022-10-16T17:22:06Z

logicalplan/merge_selects.go

+	"github.com/prometheus/prometheus/promql/parser"
+)
+
+// MergeSelectsOptimizer optimizes a binary expression where


So it feels like this optimizer is more of a cache-based optimization rather than a rule-based one? I think we also implement this caching in the physical plan with new storage and extend PromQL to support it by having a separate filter() for it which is added by an optimizer rule in the logical plan. It feels like this not only "rewrites the query", but also implements something new.

We definitely should do this selector merge rule + engine cache, but I wonder if filter is the right place to implement the logical plan rule.
This is new to me, so feel free to correct me if I'm getting this wrong somewhere! 🙂

Also, what happens to regex selectors in this case?

Yeah that's true, we are adding two optimizations in this case. The physical plan can cache series calls, and logical optimizers make sure that the cache-hit ratio is optimized.

I think adding optimizers to the physical plan would not be straightforward, since the phyisical plan is essentially a set of instantiated operators. We would need to figure out how to transform and rewire them.

Regex selectors should also be matched. It's just that we only match selectors if they are exactly the same. As a further optimization, we can extend the detection logic to know that metric{a="foo"} can be replaced with metric{a=~"f.+"}.

Let's keep it simple -> optimizer changing logical plan

Optimizers should only optimize the logical plan indeed. The physical plan is then constructed from that optimized logical plan and can make certain assumptions while being constructed.

fpetkovski · 2022-10-17T06:22:07Z

I added a benchmark for a binary query like

sum(http_requests_total{code="200"}) / sum(http_requests_total)

Here are the results:

goos: darwin
goarch: arm64
pkg: github.com/thanos-community/promql-engine/engine
BenchmarkMergeSelectorsOptimizer
BenchmarkMergeSelectorsOptimizer/withoutOptimizers
BenchmarkMergeSelectorsOptimizer/withoutOptimizers-8         	      37	  31050148 ns/op	44220461 B/op	  275174 allocs/op
BenchmarkMergeSelectorsOptimizer/withOptimizers
BenchmarkMergeSelectorsOptimizer/withOptimizers-8            	      40	  27853710 ns/op	40639897 B/op	  225706 allocs/op
PASS

I would expect the optimization to work even better in Thanos where we fetch data over the network. With a local TSDB, I believe we can reference the same series without copying them in both the nominator and denominator. In Thanos, both selectors would hold copies of the data which would increase memory usage even more, compared to not having the optimization.

I also added an explicit flag to enable/disable optimizers. Should this be set to true or false by default?

fpetkovski · 2022-10-17T06:27:20Z

@yeya24 yes thanos-io/thanos#4407 should be fixed by this. We don't cache decoded samples, this is hard because we decode chunks in time-based steps. But we do cache fetched series, which means encoded chunks will be cached. Maybe as a next improvement, we can figure out how to only decode shared chunks once and reuse them in all operators that need them.

yeya24 · 2022-10-17T06:35:26Z

I am wondering if we really need a flag. In which case we want to disable it?

fpetkovski · 2022-10-17T06:37:16Z

Ok, let's go with that. I will change the flag to DisableOptimizers. The use case I see is for debugging purposes if some queries return wrong results.

saswatamcode

LGTM! ❤️

GiedriusS · 2022-10-17T07:59:51Z

Ok, let's go with that. I will change the flag to DisableOptimizers. The use case I see is for debugging purposes if some queries return wrong results.

Maybe this new option could also cover #66?

GiedriusS · 2022-10-17T08:09:25Z

logicalplan/plan_test.go

+		{
+			name:     "common selectors",
+			expr:     `sum(metric{a="b", c="d"}) / sum(metric{a="b"})`,
+			expected: `sum(filter([a="b" c="d"], metric{a="b"})) / sum(metric{a="b"})`,


Do we need to keep the old matchers in the filter when the vector selector has them? For example, I'd expect to see

expected: `sum(filter([c="d"], metric{a="b"})) / sum(metric{a="b"})`,

Here? 🤔

That's a good point - I guess we do double selection here which is not a harm - we can optimize this later (:

True, it would be safe and better to strip selectors from filters. It will make also make filtering faster. I can follow up with a PR.

This ended up being fairly straightforward, so I implemented it in this PR.

bwplotka

Amazing work! Small nits and LGTM - can't wait to feed real data to optimizers for optimization decisions 😱

(e.g imagine logical plan would knew series for next few steps)

bwplotka · 2022-10-17T10:10:19Z

engine/engine.go

 }

 type Opts struct {
 	promql.EngineOpts

+	// DisableOptimizers disables query optimizations using logicalPlan.DefaultOptimizers.
+	DisableOptimizers bool


My worry is that this has to be extended to have different optimization modes, possible with https://yourbasic.org/golang/bitmask-flag-set-clear/

I guess we can break compatibility still and change it in future if needed.

bwplotka · 2022-10-17T10:10:53Z

engine/engine.go

-	plan, err := physicalplan.New(expr, q, ts, ts, 0, e.lookbackDelta)
+	logicalPlan := logicalplan.New(expr, ts, ts)
+	if e.enableOptimizers {
+		logicalPlan = logicalPlan.RunOptimizers(logicalplan.DefaultOptimizers)


Suggested change

logicalPlan = logicalPlan.RunOptimizers(logicalplan.DefaultOptimizers)

logicalPlan = logicalPlan.Optimize(logicalplan.DefaultOptimizers)

bwplotka · 2022-10-17T10:12:19Z

engine/engine_test.go

-				test, err := promql.NewTest(t, tc.load)
-				testutil.Ok(t, err)
-				defer test.Close()
+	for _, withOptimizers := range disableOptimizers {


Suggested change

for _, withOptimizers := range disableOptimizers {

for _, withoutOptimizers := range disableOptimizers {

bwplotka · 2022-10-17T10:13:42Z

logicalplan/filter.go

+	"github.com/prometheus/prometheus/promql/parser"
+)
+
+type FilteredSelector struct {


Yes, we have to extend AST if we want to reuse it and avoid transforming types too many times.

bwplotka · 2022-10-17T10:14:45Z

logicalplan/merge_selects.go

+	"github.com/prometheus/prometheus/promql/parser"
+)
+
+// MergeSelectsOptimizer optimizes a binary expression where


Let's keep it simple -> optimizer changing logical plan

bwplotka · 2022-10-17T10:22:43Z

logicalplan/merge_selects.go

+// and apply an additional filter for {c="d"}.
+type MergeSelectsOptimizer struct{}
+
+func (m MergeSelectsOptimizer) Optimize(expr parser.Expr) parser.Expr {


I don't really get why this merge selector can work only on __name__? Why we have to treat metric name specially here?

It's done mainly to constraint the problem and make it easier to solve. We can generalize it in subsequent iterations, I mainly wanted to prototype something simple which, and make it safe to use and match most cases.

bwplotka · 2022-10-17T10:23:53Z

logicalplan/merge_selects.go

+// and apply an additional filter for {c="d"}.
+type MergeSelectsOptimizer struct{}
+
+func (m MergeSelectsOptimizer) Optimize(expr parser.Expr) parser.Expr {


Optimize(expr parser.Expr) parser.Expr {

We need to invest in good documentation to make it easy to optimize with just this interface (:

bwplotka · 2022-10-17T10:24:29Z

logicalplan/plan.go

+	RunOptimizers([]Optimizer) parser.Expr
+}
+
+type Optimizer interface {


bwplotka · 2022-10-17T10:25:30Z

logicalplan/plan.go

+}
+
+type Optimizer interface {
+	Optimize(parser.Expr) parser.Expr


fine to add later

bwplotka · 2022-10-17T10:26:40Z

logicalplan/plan_test.go

+		{
+			name:     "common selectors",
+			expr:     `sum(metric{a="b", c="d"}) / sum(metric{a="b"})`,
+			expected: `sum(filter([a="b" c="d"], metric{a="b"})) / sum(metric{a="b"})`,


That's a good point - I guess we do double selection here which is not a harm - we can optimize this later (:

metalmatze

Really awesome stuff. Lots of good discussions!

This commit adds the first version of a logical planner with the capability to unify matchers between different selectors. The MergeSelectsOptimizer traverses the AST and identifies the most selective matcher for each individual metric. It then replaces less selective matchers with the most selective one, and adds an additional filters to ensure correctness. The physical plan can then cache results for identical selectors which leads to fewer network calls and faster series retrieval operations.

Co-authored-by: Bartlomiej Plotka <[email protected]>

fpetkovski · 2022-10-17T12:02:10Z

Thanks everyone for the comments, they should all be resolved now. Let's merge this and start playing with it.

bwplotka · 2022-10-17T12:28:44Z

💪🏽

bwplotka reviewed Oct 12, 2022

View reviewed changes

logicalplan/merge_selects.go Outdated Show resolved Hide resolved

fpetkovski force-pushed the logical-plan branch 2 times, most recently from c0ae261 to fac1a24 Compare October 12, 2022 13:22

fpetkovski changed the title ~~Logical plan~~ Add logical planer and select caching Oct 12, 2022

fpetkovski marked this pull request as ready for review October 12, 2022 13:22

fpetkovski force-pushed the logical-plan branch 4 times, most recently from 86ad32f to 5bbf3a8 Compare October 12, 2022 13:50

fpetkovski commented Oct 12, 2022

View reviewed changes

fpetkovski force-pushed the logical-plan branch from 5bbf3a8 to 5cbb92c Compare October 12, 2022 15:50

fpetkovski mentioned this pull request Oct 13, 2022

Support at modifier #75

Merged

yeya24 reviewed Oct 14, 2022

View reviewed changes

fpetkovski force-pushed the logical-plan branch 2 times, most recently from 0fe34e5 to 47b2130 Compare October 14, 2022 11:05

yeya24 approved these changes Oct 14, 2022

View reviewed changes

saswatamcode reviewed Oct 16, 2022

View reviewed changes

fpetkovski force-pushed the logical-plan branch from 47b2130 to 9c30b72 Compare October 17, 2022 06:16

saswatamcode approved these changes Oct 17, 2022

View reviewed changes

GiedriusS reviewed Oct 17, 2022

View reviewed changes

bwplotka approved these changes Oct 17, 2022

View reviewed changes

metalmatze reviewed Oct 17, 2022

View reviewed changes

fpetkovski force-pushed the logical-plan branch from 1774d50 to c210d0f Compare October 17, 2022 11:38

fpetkovski added 2 commits October 17, 2022 13:46

Add benchmark and optional flag

043ac60

fpetkovski and others added 4 commits October 17, 2022 13:46

Change flag to DisableOptimizers

89b6a22

Update logicalplan/merge_selects.go

596ab0e

Co-authored-by: Bartlomiej Plotka <[email protected]>

Elaborate on selectivity

03538e8

Drop selector filters

dc26786

fpetkovski force-pushed the logical-plan branch from aba7ccd to 2fc8e09 Compare October 17, 2022 11:46

Remove obsolete scan.seriesSelector

0711911

fpetkovski force-pushed the logical-plan branch from 2fc8e09 to 0711911 Compare October 17, 2022 11:57

fpetkovski merged commit 67a4593 into thanos-io:main Oct 17, 2022

	type Optimizer interface {
	type RuleBasedOptimizer interface {

	logicalPlan = logicalPlan.RunOptimizers(logicalplan.DefaultOptimizers)
	logicalPlan = logicalPlan.Optimize(logicalplan.DefaultOptimizers)

	for _, withOptimizers := range disableOptimizers {
	for _, withoutOptimizers := range disableOptimizers {

Add logical planer and select caching #79

Add logical planer and select caching #79

Conversation

fpetkovski commented Oct 12, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yeya24 left a comment • edited Loading

Choose a reason for hiding this comment

saswatamcode left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fpetkovski Oct 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

saswatamcode Oct 16, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fpetkovski commented Oct 17, 2022

fpetkovski commented Oct 17, 2022

yeya24 commented Oct 17, 2022 • edited Loading

fpetkovski commented Oct 17, 2022

saswatamcode left a comment

Choose a reason for hiding this comment

GiedriusS commented Oct 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fpetkovski Oct 17, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bwplotka left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

metalmatze left a comment

Choose a reason for hiding this comment

fpetkovski commented Oct 17, 2022

bwplotka commented Oct 17, 2022

fpetkovski commented Oct 12, 2022 •

edited

Loading

yeya24 left a comment •

edited

Loading

fpetkovski Oct 17, 2022 •

edited

Loading

saswatamcode Oct 16, 2022 •

edited

Loading

yeya24 commented Oct 17, 2022 •

edited

Loading

GiedriusS commented Oct 17, 2022 •

edited

Loading

fpetkovski Oct 17, 2022 •

edited

Loading