Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ClassCastException with MV_EXPAND on missing field #110096

Merged

Conversation

luigidellaquila
Copy link
Contributor

Fixes #109974

LocalLogicalPlanOptimizer.ReplaceMissingFieldWithNull tries to replace missing fields (ie. fields that are not present on the mappings of local node) with a NULL literal.
This is not possible for MvExpand, because the target is a NamedExpression (and Literal is not). This caused ClassCastException.

This PR fixes the problem using a null Alias in this specific case.

@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @luigidellaquila, I've created a changelog YAML for you.

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 24, 2024
@astefan astefan self-requested a review June 25, 2024 10:29
Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Luigi, nice catch!

I have some minor remarks, but this LGTM!

plan = plan.transformExpressionsOnlyUp(
FieldAttribute.class,
f -> stats.exists(f.qualifiedName()) ? f : new Alias(f.source(), f.name(), null, Literal.of(f, null), f.id())
);
}
// otherwise transform fields in place
else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ehm, isn't the default case here just asking for trouble? We're replacing field attributes by literals; that's only safe to do further down in the expression tree (if at all). I wonder if we rather shouldn't optimize at all in these cases; or maybe have a version of transformExpressionsOnlyUp that never applies to the top-level expression. (Although that would still be unsafe in general, but could be acceptable here.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a reasonable concern, probably we should not optimize at all

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I removed the optimization for now

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you mean the final else. Yes, that's fragile and could lead to more similar failures when we add new commands.

I didn't change it because it's needed to trigger a few more downstream optimizations, see all the LocalLogicalPlanOptimizerTests.testMissingField* tests, and because it's relatively safe with current feature set.

It's worth remembering that it's dangerous though, I'll open an issue so that we can revisit it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

// TODO but this could be a follow-up optimization
plan = plan.transformExpressionsOnlyUp(
FieldAttribute.class,
f -> stats.exists(f.qualifiedName()) ? f : new Alias(f.source(), f.name(), null, Literal.of(f, null), f.id())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, rather than replacing the expression but keeping the MV_EXPAND, wouldn't it be cleaner to replace the MV_EXPAND by an EVAL? That might profit better from our EVAL optimizations, pruning rules, etc. (if they're applied after this - didn't check.) or at least be easy to understand what to expect from.

E.g.

FROM idx
| MV_EXPAND field

becomes (if field is missing)

FROM idx
| EVAL field = null

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered it as a follow-up.
I didn't do it with this PR because it can be slightly tricky and I wanted to treat it as an optimization rather than a fix.
In particular, in the example you propose, field represents two different things:

  • for MV_EXPAND it's the target; the output will have the same name but a different ID
  • for EVAL it's a completely new field, that will need mask the previous one and have the same ID as the output for MV_EXPAND.

var plan = plan("""
from test
| mv_expand last_name
| keep first_name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be

| keep last_name

This may (in the future) make this test fail (for the wrong reasons): last_name is irrelevant and may be pruned.

@astefan
Copy link
Contributor

astefan commented Jun 25, 2024

Please, hold on on merging this PR.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@luigidellaquila
Copy link
Contributor Author

Thanks folks! Merging

@luigidellaquila luigidellaquila merged commit df47e26 into elastic:main Jun 28, 2024
15 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.14 Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 110096

@luigidellaquila
Copy link
Contributor Author

Manual backport #110264

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL backport pending >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.14.3 v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ES|QL: ClassCastException - Literal cannot be cast to NamedExpression during local planning
4 participants