-
Notifications
You must be signed in to change notification settings - Fork 72
Improve block querying with a binary join and sync iterator #807
Improve block querying with a binary join and sync iterator #807
Conversation
5fd4c58
to
9e40582
Compare
for i := definitionLevel + 1; i < len(t); i++ { | ||
t[i] = -1 | ||
// the following is nextSlow() unrolled | ||
switch repetitionLevel { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know what's going on here ? why 5 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think might be the highest definition level the otel schema has?
Probably @joe-elliott might be able to anwser that better.
In the grand scheme of things, the gain is probably not worth the unrolling for us, happy to put that back to the original (grafana/tempo#2412).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonswine is correct. it's just an arbitrary depth that is enough for our schema.
i don't recall the numbers but profiling heavy "needle in the haystack" queries showed a large %age of the time in this function (which is why we unrolled it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine if we don't overflow our current definition/repetition level for the 3 column we use this iterator with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is inline with limit imposed by RowNumber [6]int64
6f92b4a
to
38d0c1a
Compare
}, | ||
{ | ||
name: "time before results", | ||
expectedResultCount: 0, | ||
timePredicate: NewIntBetweenPredicate(-10, -1), | ||
seriesPageReads: 1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cyriltovena We could prevent this single read if we would push down TimeNanos predicates into the SeriesID iterator.
I don't think we should do this as that PR though 😆
This predicate still wouldn't work all that well with a fully SeriesID sorted block (so accress row groups). But let's fix that separately
A call to Next/Seek after a close should also yield a context cancelled.
38d0c1a
to
d34f635
Compare
The difference in the
But getting rid of the async aspect is beneficial stopping overreading. A bigger impact is on the
Left --- / Right +++ The impact to the overall query time is no longer so significant, this was probably due a bug that missed out on some matches. |
pkg/phlaredb/query/iters.go
Outdated
} | ||
} else { | ||
// the right value can't be smaller than the left one because we seeked beyond it | ||
panic("not expected to happen") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you make the panic message more specific ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in 48e524a
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
…phlare#807) * Pulll tempo sync * BinaryJoinIterator * Remove unused code * Make use of the BinaryJoinIterator and remove JoinIterator * Correctly log the ReadPages on span * Use load on the atomic page stats * Get tests from tempo * Move map predicate and base it off the min/max This predicate still wouldn't work all that well with a fully SeriesID sorted block (so accress row groups). But let's fix that separately * Fix bug when we iterate left iterator based on a higher right result * Test BinaryJoinIterator thoroughly * Optimize when the Seek destination is only a single row away * Fix tests comments * Validate that page reads are as expected * Fix closing and cancellation A call to Next/Seek after a close should also yield a context cancelled. * Do cancel it in the right place * Improve panic messaging * Add nextSlow and uncomment test
This should avoid over reading at then end of the join (with Tempo's sync iterator) and also make sure it seeks before looking into the second iterator of the join. (Binary Join iterator).