Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ES|QL] weighted_avg #109993

Merged
merged 16 commits into from
Jul 2, 2024
Merged

Conversation

fang-xing-esql
Copy link
Member

@fang-xing-esql fang-xing-esql commented Jun 20, 2024

Add weighted_avg function for #109940

We need two rules to be able to support weighted_avg - ReplaceStatsNestedExpressionWithEval and SubstituteSurrogates.

SubstituteSurrogates rewrites weighted_avg(value, weight) to sum(value*weight)/sum(weight), and ReplaceStatsNestedExpressionWithEval rewrite sum(value*weight) to eval f=value*weight and then sum(f). And we want ReplaceStatsNestedExpressionWithEval to be applied after SubstituteSurrogates. Today ReplaceStatsNestedExpressionWithEval is applied before SubstituteSurrogates, and they are applied only once. Switching the order causes some test failures, so a separate batch is created for the 4 rules related to stats transformation, so that they can be applied multiple times together.

Copy link

Documentation preview:

@elasticsearchmachine
Copy link
Collaborator

Hi @fang-xing-esql, I've created a changelog YAML for you.

@fang-xing-esql
Copy link
Member Author

fang-xing-esql commented Jun 20, 2024

The weighted_avg is similar to avg, it is a SurrogateExpression, is rewritten with existing functions.

@astefan
Copy link
Contributor

astefan commented Jun 21, 2024

@elasticmachine update branch

@fang-xing-esql fang-xing-esql marked this pull request as ready for review June 24, 2024 03:18
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 24, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@astefan astefan requested a review from alex-spies June 25, 2024 10:21
Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fang-xing-esql ! We should probably discuss if we really want to add the new transformation batch, or if we can somehow do without that. Other than that, I have mostly minor remarks.

return new MvAvg(s, field);
}
if (weight.foldable()) {
return new Div(s, new Sum(s, field), new Count(s, field), dataType());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I think we could use Avg.surrogate() rather than re-implementing it here.
  • This doesn't account for the case when weight is null. Annoyingly, we cannot just fold weight here because FoldNull needs to run first; but maybe we could just run FoldNull here, specifically. We should also add csv tests for this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the validation of weight(null or zero) in resolveType(), that's the best place that I can think of for now, as at the end of logical plan optimizer, the weighted_avg will be rewritten to other expressions, validate() does not work there.

if (weight.foldable()) {
return new Div(s, new Sum(s, field), new Count(s, field), dataType());
} else {
return new Div(s, new Sum(s, new Mul(s, field, weight)), new Sum(s, weight), dataType());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought (absolutely not in scope here): When aggregating, it's normally okay when the field is multivalued and I think the result of aggs is the same as if the field was mv_expand'ed beforehand. I wonder if weighted averages might be more useful if the substitution behaved the same way.

| MV_AVG(field, weights)
->
| SUM( MV_SUM(field) * weight) / SUM(MV_COUNT(field) * weight)

or even ->
| SUM( MV_SUM(field) * MV_SUM(weight) / SUM( MV_COUNT(field) * MV_SUM(weight))

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was also thinking of how to deal with multi-valued field in the aggregation function, or even have a mv_weighted_agg? It might be confusing when combining them together, perhaps keep it simple for now and expand it when we see more use cases.

return new MvAvg(s, field);
}
if (weight.foldable()) {
return new Div(s, new Sum(s, field), new Count(s, field), dataType());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this condition here is correct. Maybe you meant weight is foldable and equals to 1, then this is considered to be the regular average. Also, you cannot call fold here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the formula itself is correct if the weight is non-null and not 0 - this should be just Avg then.

Speaking of, we probably should also add tests for a constant weight of value 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I've looked up the formula online and did the math. I was misled by the ES documentation on this aggregation.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR LGTM after adding the tests @alex-spies mentioned here and, also, updating from main and removing the change done to the LogicalPlanOptimizer.

@fang-xing-esql fang-xing-esql added the ES|QL-ui Impacts ES|QL UI label Jul 2, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/kibana-esql (ES|QL-ui)

@fang-xing-esql fang-xing-esql merged commit 8abc885 into elastic:main Jul 2, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement ES|QL-ui Impacts ES|QL UI Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants