Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LOOKUP shouldn't duplicate the output if the same field was already present in the input #109392

Closed
astefan opened this issue Jun 5, 2024 · 4 comments · Fixed by #109807
Closed
Assignees
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)

Comments

@astefan
Copy link
Contributor

astefan commented Jun 5, 2024

Description

This is a follow up to LOOKUP work where, if there is a field with a name identical to the one LOOKUP introduces, both of them appear in the results. We need to be consistent here and:

  • only one of the fields should be in the results
  • that field should be the one introduced by LOOKUP (same approach is being used by the ENRICH command)
@astefan astefan mentioned this issue Jun 5, 2024
10 tasks
@elasticsearchmachine elasticsearchmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 5, 2024
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-analytical-engine (Team:Analytics)

@nik9000
Copy link
Member

nik9000 commented Jun 5, 2024

I don't believe both appear in the results:

// Makes sure the LOOKUP squashes previous names 
doesNotDuplicateNames
required_capability: lookup
FROM employees
| SORT emp_no
| LIMIT 4
| RENAME languages.long AS long
| EVAL name = CONCAT(first_name, " ", last_name)
| LOOKUP long_number_names ON long
| RENAME long AS languages
| KEEP emp_no, languages, name
;

emp_no:integer | languages:long | name:keyword
         10001 |              2 | two
         10002 |              5 | five
         10003 |              4 | four
         10004 |              5 | five
;

At least, not in the code as it stands as of yesterday. The LOOKUP result wins. Now, the column pruning happens too late which is a problem, but the output presently looks ok.

Also, I might be doing it in a weird way. The QL rules are indeed complex.

@astefan
Copy link
Contributor Author

astefan commented Jun 5, 2024

@nik9000 the query I was looking at few hours ago is much simpler:

{
    "query": "ROW int = 5, name = 123 | LOOKUP int_number_names ON int",
    "tables": {"int_number_names": {"int:integer": [0,1,2,3,4,5,6,7,8,9,10], "name:keyword": ["zero","one","two","three","four","five","six","seven","eight","nine","ten"]}}
}

The result I got:

      int      |     name      |     name      
---------------+---------------+---------------
5              |123            |five           

@nik9000
Copy link
Member

nik9000 commented Jun 5, 2024

neat!

elasticsearchmachine pushed a commit that referenced this issue Jun 25, 2024
Fix #109392

This makes attribute shadowing of LOOKUP consistent with ENRICH,
DISSECT/GROK and EVAL.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/ES|QL AKA ESQL >enhancement Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants