Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Fix LOOKUP attribute shadowing #109807

Merged
merged 22 commits into from
Jun 25, 2024

Conversation

alex-spies
Copy link
Contributor

@alex-spies alex-spies commented Jun 17, 2024

Fix #109392

This makes attribute shadowing of LOOKUP consistent with ENRICH, DISSECT/GROK and EVAL.

@alex-spies alex-spies marked this pull request as ready for review June 17, 2024 16:36
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine
Copy link
Collaborator

Hi @alex-spies, I've created a changelog YAML for you.

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 17, 2024
@@ -14,6 +14,43 @@ foo bar | null | null
;


shadowing
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added analogous csv tests to LOOKUP, ENRICH, DISSECT, GROK and EVAL, since they should handle attribute shadowing identically.

@@ -39,7 +38,7 @@ public class Lookup extends UnaryPlan {
/**
* References to the input fields to match against the {@link #localRelation}.
*/
private final List<NamedExpression> matchFields;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are always attributes (unresolved or resolved); the full generality of NamedExpression (including Alias etc.) makes reasoning about the match fields later down the road needlessly hard.

Comment on lines +92 to +93
List<Attribute> fieldsAddedFromRight = removeCollisionsWithMatchFields(rightOutput, matchFieldSet, matchFieldNames);
yield mergeOutputAttributes(makeNullable(makeReference(fieldsAddedFromRight)), leftOutput);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using mergeOutputAttributes correctly here is the crux of this PR.

@alex-spies
Copy link
Contributor Author

I'm a little conflicted on whether some logical plans should have unit tests. Arguably, we do catch inconsistent logical plan outputs either via the dependency checker or in csv tests; on the other hand, computing the output esp. for Join is complex enough that I'd normally want to test this in particular.
For this PR, I remained consistent with the existing test infrastructure and only tested via csv tests.

Copy link
Contributor

@astefan astefan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}
}
return results;
return attributes.stream().filter(attr -> matchFields.contains(attr) || matchFieldNames.contains(attr.name()) == false).toList();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We tend to avoid using streams, except tests, as it's an unnecessary runtime complication.

public HashJoinExec(PlanStreamInput in) throws IOException {
super(Source.readFrom(in), in.readPhysicalPlanNode());
this.joinData = new LocalSourceExec(in);
this.matchFields = in.readNamedWriteableCollectionAsList(NamedExpression.class);
this.matchFields = (List<Attribute>) (List) in.readNamedWriteableCollectionAsList(NamedExpression.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we haven't cut a release of with this and we don't allow these constructs in serverless yet I think you can change this to in.readNamedWriteableCollectionAsList(Attribute.class). There isn't any code that'll ever send anything that isn't an Attribute.

Also, if these were always Attribute subclasses in practice then this'd be safe here too - it's just on the read side.

@nik9000 nik9000 mentioned this pull request Jun 20, 2024
10 tasks
@alex-spies alex-spies added the auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) label Jun 20, 2024
@elasticsearchmachine elasticsearchmachine merged commit 5b959b9 into elastic:main Jun 25, 2024
15 checks passed
@alex-spies alex-spies deleted the fix-lookup-var-shadowing branch June 25, 2024 09:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL auto-merge Automatically merge pull request when CI checks pass (NB doesn't wait for reviews!) >bug Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) v8.15.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LOOKUP shouldn't duplicate the output if the same field was already present in the input
4 participants