Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL/Search: Fix scoring when RANK and WHERE are both specified #109851

Merged
merged 2 commits into from
Jun 19, 2024

Conversation

ioanatia
Copy link
Contributor

@ioanatia ioanatia commented Jun 18, 2024

DRAFT: WIP needs more tests

Fixes the following issues:

  • A query like SEARCH test [ | WHERE id > 2 AND id <= 4 ] returned scores > 0 - in this case we are only filtering the results, which has no effect on scoring, hence the return scores should all be 0.
  • A query where we combined RANK and WHERE returned scores equal to 0- for example SEARCH test [ | WHERE id > 2 | RANK MATCH(content, "quick brown dog")]

This happened because of the way we were forming the boolean query.
PushRankToSource now translates the RANK expression as a SHOULD clause, instead of FILTER.
Both PushRankToSource and PushFiltersToSource used Queries.combine which receives the type of clause to use and the a list of queries. If the list of queries contains a single non-null query, it simply returns that query - it will not actually construct a boolean query. Furthermore QueriesCombine will check if the first query from the list is a boolean query and it will attempt to reuse it.
To work around this behaviour we made sure that the first query that is being sent to QueriesCombine is always a boolean query.
Furthermore, I had to put the minimumShouldMatch: 1 - so that when RANK is used, we return documents that match the RANK expression.

Copy link
Contributor

@ChrisHegarty ChrisHegarty left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

assertThat(values.get(0), contains(3, 2.0f));
assertThat(values.get(1), contains(4, 2.0f));
assertThat(values.get(0), contains(3, 0.0f));
assertThat(values.get(1), contains(4, 0.0f));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

var query = Queries.combine(Clause.FILTER, asList(queryExec.query(), planQuery));
var baseQuery = queryExec.query() != null ? queryExec.query() : new BoolQueryBuilder();
BoolQueryBuilder query = (BoolQueryBuilder) Queries.combine(Clause.SHOULD, asList(baseQuery, planQuery));
query.minimumShouldMatch(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@ioanatia ioanatia marked this pull request as ready for review June 19, 2024 10:51
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Jun 19, 2024
@ioanatia
Copy link
Contributor Author

I added some tests and made sure we are modifying the behaviour of PushFiltersToSource only when scoring is needed.

@ioanatia ioanatia merged commit c070ce3 into elastic:esql/search Jun 19, 2024
9 of 17 checks passed
@ioanatia ioanatia deleted the fix_push_rank_to_source branch June 19, 2024 10:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs:triage Requires assignment of a team area label
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants