Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ESQL] Make query wrapped by SingleValueQuery cacheable #110116

Merged
merged 9 commits into from
Jul 17, 2024

Conversation

iverase
Copy link
Contributor

@iverase iverase commented Jun 25, 2024

This commit simplifies SingleValueQuery by introducing a new lucene query called SingleValueMatchQuery. This query iterates over doc values and return only the fields that are single-valued. Then SingleValueQuery can generate a lucene query which is a boolean query with two filters, one with the original query and the other with a SingleValueMatchQuery.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 25, 2024
@elasticsearchmachine
Copy link
Collaborator

Hi @iverase, I've created a changelog YAML for you.


@Override
public Scorer get(long leadCost) {
TwoPhaseIterator iterator = new TwoPhaseIterator(DocIdSetIterator.all(maxDoc)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can do something better here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hand rolling each of the subclasses of these because I was worried the indirection from the Predicate would be expensive so deep in the hot path.

For doing something better, do you mean the DISI.all call? I imagine it's ok because this is always going to be used with another query that's going to lead iteration naturally. I think the worst case scenario is when only a few docs have multiple values and the other query matches almost everything. I feel like we can live with it.

@iverase
Copy link
Contributor Author

iverase commented Jul 15, 2024

@elasticmachine update branch


@Override
public Scorer get(long leadCost) {
TwoPhaseIterator iterator = new TwoPhaseIterator(DocIdSetIterator.all(maxDoc)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was hand rolling each of the subclasses of these because I was worried the indirection from the Predicate would be expensive so deep in the hot path.

For doing something better, do you mean the DISI.all call? I imagine it's ok because this is always going to be used with another query that's going to lead iteration naturally. I think the worst case scenario is when only a few docs have multiple values and the other query matches almost everything. I feel like we can live with it.

assertThat(builder.stats().ordinalsMultiNoApprox(), equalTo(0));
assertThat(builder.stats().ordinalsMultiApprox(), equalTo(0));
assertThat(builder.stats().bytesApprox(), equalTo(0));
assertThat(builder.stats().bytesNoApprox(), equalTo(0));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought the stats would be useful in debugging slow queries. It's certainly nice to know things like "this rewrote itself to match all in 7/8 segments".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I don't think it's a blocker.

@alex-spies alex-spies added :Analytics/ES|QL AKA ESQL and removed :Analytics/EQL EQL querying labels Jul 15, 2024
@iverase
Copy link
Contributor Author

iverase commented Jul 17, 2024

@elasticmachine update branch

@iverase iverase merged commit e412f28 into elastic:main Jul 17, 2024
15 checks passed
@iverase iverase deleted the SingleValueMatchQuery branch July 17, 2024 11:18
salvatore-campagna pushed a commit to salvatore-campagna/elasticsearch that referenced this pull request Jul 17, 2024
This commit simplifies SingleValueQuery by introducing a new lucene query called SingleValueMatchQuery. 
This query iterates over doc values and return only the fields that are single-valued.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.16.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants