-
Notifications
You must be signed in to change notification settings - Fork 24.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[ES|QL] weighted_avg #109993
[ES|QL] weighted_avg #109993
Conversation
Documentation preview: |
Hi @fang-xing-esql, I've created a changelog YAML for you. |
The weighted_avg is similar to avg, it is a SurrogateExpression, is rewritten with existing functions. |
...ql/src/main/java/org/elasticsearch/xpack/esql/expression/function/aggregate/WeightedAvg.java
Outdated
Show resolved
Hide resolved
@elasticmachine update branch |
Pinging @elastic/es-analytical-engine (Team:Analytics) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @fang-xing-esql ! We should probably discuss if we really want to add the new transformation batch, or if we can somehow do without that. Other than that, I have mostly minor remarks.
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/action/EsqlCapabilities.java
Outdated
Show resolved
Hide resolved
x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/io/stream/PlanNamedTypes.java
Outdated
Show resolved
Hide resolved
return new MvAvg(s, field); | ||
} | ||
if (weight.foldable()) { | ||
return new Div(s, new Sum(s, field), new Count(s, field), dataType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I think we could use
Avg.surrogate()
rather than re-implementing it here. - This doesn't account for the case when
weight
is null. Annoyingly, we cannot just foldweight
here becauseFoldNull
needs to run first; but maybe we could just runFoldNull
here, specifically. We should also add csv tests for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the validation of weight(null or zero) in resolveType()
, that's the best place that I can think of for now, as at the end of logical plan optimizer, the weighted_avg will be rewritten to other expressions, validate()
does not work there.
...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java
Outdated
Show resolved
Hide resolved
if (weight.foldable()) { | ||
return new Div(s, new Sum(s, field), new Count(s, field), dataType()); | ||
} else { | ||
return new Div(s, new Sum(s, new Mul(s, field, weight)), new Sum(s, weight), dataType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thought (absolutely not in scope here): When aggregating, it's normally okay when the field is multivalued and I think the result of aggs is the same as if the field was mv_expand'ed beforehand. I wonder if weighted averages might be more useful if the substitution behaved the same way.
| MV_AVG(field, weights)
->
| SUM( MV_SUM(field) * weight) / SUM(MV_COUNT(field) * weight)
or even ->
| SUM( MV_SUM(field) * MV_SUM(weight) / SUM( MV_COUNT(field) * MV_SUM(weight))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was also thinking of how to deal with multi-valued field in the aggregation function, or even have a mv_weighted_agg? It might be confusing when combining them together, perhaps keep it simple for now and expand it when we see more use cases.
return new MvAvg(s, field); | ||
} | ||
if (weight.foldable()) { | ||
return new Div(s, new Sum(s, field), new Count(s, field), dataType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this condition here is correct. Maybe you meant weight is foldable and equals to 1, then this is considered to be the regular average. Also, you cannot call fold
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the formula itself is correct if the weight is non-null and not 0 - this should be just Avg then.
Speaking of, we probably should also add tests for a constant weight of value 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I've looked up the formula online and did the math. I was misled by the ES documentation on this aggregation.
...k/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/optimizer/LogicalPlanOptimizer.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR LGTM after adding the tests @alex-spies mentioned here and, also, updating from main
and removing the change done to the LogicalPlanOptimizer.
Pinging @elastic/kibana-esql (ES|QL-ui) |
Add weighted_avg function for #109940
We need two rules to be able to support weighted_avg -
ReplaceStatsNestedExpressionWithEval
andSubstituteSurrogates
.SubstituteSurrogates
rewritesweighted_avg(value, weight)
tosum(value*weight)/sum(weight)
, andReplaceStatsNestedExpressionWithEval
rewritesum(value*weight)
toeval f=value*weight
and thensum(f)
. And we wantReplaceStatsNestedExpressionWithEval
to be applied afterSubstituteSurrogates
. TodayReplaceStatsNestedExpressionWithEval
is applied beforeSubstituteSurrogates
, and they are applied only once. Switching the order causes some test failures, so a separate batch is created for the 4 rules related to stats transformation, so that they can be applied multiple times together.