Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ESQL: Resolve tables to LocalRelation centrally #110097

Closed
wants to merge 1 commit into from

Conversation

nik9000
Copy link
Member

@nik9000 nik9000 commented Jun 24, 2024

This moves the resolution of table parameters to LocalRelations so that we can cache that resolution. So we get the same LocalRelation every time we resolve the same table. That isn't strictly needed, but it feels good.

This moves the resolution of `table` parameters to `LocalRelation`s so
that we can cache that resolution. So we get the same `LocalRelation`
every time we resolve the same table. That isn't strictly needed, but it
feels good.
@nik9000 nik9000 requested a review from alex-spies June 24, 2024 14:05
@nik9000 nik9000 mentioned this pull request Jun 24, 2024
10 tasks
@elasticsearchmachine elasticsearchmachine added the needs:triage Requires assignment of a team area label label Jun 24, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/kibana-esql (ES|QL-ui)

@nik9000 nik9000 added :Analytics/ES|QL AKA ESQL and removed ES|QL-ui Impacts ES|QL UI labels Jun 24, 2024
@nik9000
Copy link
Member Author

nik9000 commented Jun 24, 2024

Not a UI change. Sorry, I clicked the wrong button.

@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 24, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@elasticsearchmachine elasticsearchmachine removed the needs:triage Requires assignment of a team area label label Jun 24, 2024
@costin costin requested review from astefan and bpintea June 26, 2024 08:42
Copy link
Contributor

@alex-spies alex-spies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heya; this approach could work if we do shallow clones of the local relation which has new field attributes every time; in the current state, I think this'll lead to bugs later down the road.

However, when we discussed this approach I assumed that there's some cost of creating the local relations in the first place - but the data is already in the form of blocks, in the columns; this makes me think that it's fine to just create multiple local relations.

IMHO it'd be more important to make sure the memory accounting is enabled and correct for the columns from the tables; these can hog a bit of memory, and it'd be important to correctly incref/decref the contained blocks + hook this up with an actual circuit breaker.

for (Map.Entry<String, Column> entry : table.entrySet()) {
Column column = entry.getValue();
EsField field = new EsField(entry.getKey(), column.type(), Map.of(), false, false);
attributes.add(new FieldAttribute(Source.EMPTY, null, entry.getKey(), field));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought: I just realized we act as if we had fields from ES here; maybe it would be more correct to use either reference attributes (then we don't need to reference-ify the field attrs later), or even introduce a new attribute type (local attribute or so).

/**
* Lazy conversion of {@link #tables} to {@link LocalRelation}.
*/
private final Map<String, LocalRelation> tablesFromLocalRelation = new HashMap<>();
Copy link
Contributor

@alex-spies alex-spies Jun 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since columns are essentially blocks, wouldn't it make it simpler if we replace the tables map from line 56 by this, and hand proper local relations to the constructor of EsqlConfiguration immediately?

Scratch that, the local relation would already come with attributes which shouldn't be the same for each local relation, see below.

attributes.add(new FieldAttribute(Source.EMPTY, null, entry.getKey(), field));
blocks[i++] = column.values();
}
localRelation = new LocalRelation(Source.EMPTY, attributes, LocalSupplier.of(blocks));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're about to lay the ground for subtle bugs for multiple lookups. When there's multiple LOOKUPs in a query that use the same table, the second should replace the attributes added by the first; but they're gonna be the exact same attributes and thus be indistinguishable from each other!

Your test illustrates this well:

              FROM test
            | RENAME languages AS int
            | LOOKUP int_number_names ON int
// now we have `name` and it spells out the number of languages
            | RENAME name AS languages_name, int AS languages
            | EVAL int = LENGTH(last_name)
            | LOOKUP int_number_names ON int
// now `name` should spell out the length of the last name

After each LOOKUP, the name attribute is exactly the same, down to the attribute id, even though shadowing did happen. On the level of attributes, the second LOOKUP was a noop.

The attribute id is used to resolve shadowing situations, so this is not good.

(For reference: This situation can also come up in SQL implementations, but for JOINs SQL normally requires to have distinct table names/aliases which then become qualifiers of the attribute name; in this sense, SQL also has distinct attributes even when joining multiple times with the same table.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like this is a good argument for not doing this change at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed I made a small mistake in the example I mentioned; between the lookups, we rename name (added by first lookup) to something else. (And then add name again.) My argument stands if we remove the rename, but even as-is it's not great because we have two differently named attributes with different meaning but same attribute ID.

I think I'd prefer not doing this change at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'd prefer not doing this change at all.

Easy. Closed.

@nik9000 nik9000 removed the v8.15.0 label Jun 27, 2024
@nik9000 nik9000 closed this Jun 27, 2024
@nik9000 nik9000 deleted the esql_join_resolve_in_config branch June 27, 2024 12:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >non-issue Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants