Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Highlighting array field - Also return non-matching entries #7416

Open
panmari opened this issue Aug 22, 2014 · 31 comments
Open

Highlighting array field - Also return non-matching entries #7416

panmari opened this issue Aug 22, 2014 · 31 comments
Labels
>enhancement good first issue low hanging fruit help wanted adoptme :Search Relevance/Highlighting How a query matched a document Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch

Comments

@panmari
Copy link

panmari commented Aug 22, 2014

I have an array field with the entries [foo, foobar, bar] and search for foo. The highlighting then returns for that field

[<em>foo</em>, <em>foo</em>bar]

I would like it to return

[<em>foo</em>, <em>foo</em>bar, bar]

I did try to set no_match_size as described on http:https://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-highlighting.html but that didn't work. Is there any way to make elasticsearch behave the way I want?

@panmari panmari changed the title Highlighting array field - but also return non-matching entries Highlighting array field - Also return non-matching entries Aug 22, 2014
@nik9000
Copy link
Member

nik9000 commented Aug 22, 2014

I don't believe it has an option to do that right now. I don't think it'd be too hard to build though.

@panmari
Copy link
Author

panmari commented Aug 29, 2014

What would be better: To respect the setting of no_match_size for every single array entry or introduce a new setting parameter?

@leeho123
Copy link

Hi, did this ever get fixed? I'm relying on this functionality returning non-matched entries for my application.

@prashanttct07
Copy link

Hi Team,
Is there any plan to fix this in ES 5.0

@mouafa
Copy link

mouafa commented May 25, 2016

+1 here

@edeak
Copy link

edeak commented Oct 6, 2016

+1

1 similar comment
@cosmin-marginean
Copy link

+1

@hmottestad
Copy link

Would also like this. Or some other way to find out what fields in the original document actually got highlighted when having nested documents with arrays.

@hvelucha
Copy link

hvelucha commented Mar 9, 2017

+1

@ashitpupu
Copy link

any fixes available for this ??? My work would hugely depend on this.
If any fixes or script available, please let me know.

@vcollignon
Copy link

+1

@markisme
Copy link

Waiting for any fixes...

@Abduvakilov
Copy link

+1

1 similar comment
@alex-kuck
Copy link

+1

@jimczi
Copy link
Contributor

jimczi commented Mar 22, 2018

cc @elastic/es-search-aggs

@jimczi jimczi added the discuss label Mar 23, 2018
@shwetaskatdare
Copy link

+1

1 similar comment
@grantharper
Copy link

+1

@musukvl
Copy link

musukvl commented Aug 20, 2018

+1
I have an ordered list of paragraphs in my documents, so It is very handy to store it like an array and display it directly from the highlight section.
Currently, I need to merge _source lines and highlight lines in code. It looks terrible: I'm removing highlighting tags and match _source with highlight.

I also faced with the need to get non-matching entries because arrays are the simplest way to implement relations. I know sub items order so I don't need to use nested objects. For example, I can just store usernames like an arrays:

[ "1", "2", "3" ] 
[ "Alice", "John Doe", "Bob" ] 

instead of using objects with ID:

[{id:"1", name: "Alice"}, {id:"2", name: "John Doe"}, {id: "3", name:"Bob"} ] 

@jimczi
Copy link
Contributor

jimczi commented Aug 21, 2018

We index multi-valued fields as if it was a single value, each entry is appended and separated with a custom separator. This means that the offsets that we index are relative to the concatenated text.
This is needed because we don't know which value matched the query but only the offsets in the original text. In this context no_match_size refers to the situation where none of the value matches the query so we use the first value to populate the response. Though I think we should be able to return all values if the number of fragments is set to 0 (which means highlight the whole text). Currently only values that match the query are returned but if a single value matches then we should return all values. For the reason stated above this is not a low hanging fruit but it should be doable.
I don't have time to work on this at the moment so I'll mark this issue with the adoptme tag and will come back to it if nobody is interested.

@jimczi jimczi added the help wanted adoptme label Aug 21, 2018
@jimczi jimczi removed their assignment Aug 21, 2018
@otherBoy
Copy link

Any update on the newest version?

@mingyitianxia
Copy link

Any update on the newest version?

@robmartin-scibite
Copy link

This would be really useful. Would make things a lot easier for a project Im working on. Any update as to whether we are likely to see it implemented any time soon?

@rjernst rjernst added the Team:Search Meta label for search team label May 4, 2020
@jun915tae
Copy link

Waiting for any fixes

@kazykenov
Copy link

+1

@nemphys
Copy link

nemphys commented May 5, 2021

Any news on this one? It's quite old and I suppose a lot of people would be happy to see it implemented.

@belousevgen
Copy link

belousevgen commented Jun 16, 2021

+1
Our project also depends on this feature. Any news on it?

I really don't see how I can use Highlight feature, until this is resolved.
It is very handy to display array of values directly from the highlight section. But since it is missing non-matched entities - this is not the case for me.

I thought I can manually merge source and highlight, but if you have simple array of strings - it's not possible to merge.

if source: ["foo", "bar", "foo bar"] (3 records)
and highlights: ["<em>foo</em>", "<em>foo</em>bar"] (2 records)

As a work around I could matched it by index, if highlights returns "null" for non-matched records like
["<em>foo</em>", null, "<em>foo</em>bar"]

Is there another workaround at this point?

@ghost
Copy link

ghost commented Apr 14, 2022

+1

@Selroy46
Copy link

Selroy46 commented Jul 25, 2023

If you have a nested object, there is a way to find out what element was highlighted if you use inner_hits in nested query.

For example if you have document like this:

{
  "id": 1,
  "title": "foo",
  "comments": [{"id": 1, "text": "foo"}, {"id": 2, "text": "bar"}, {"id": 3, "text": "foobar"}]
}

You can make a query like this:

{
  "nested": {
    "path": "comments",
    "query": {"term": {"text": "foobar"}}
    "inner_hits": {
      "_source": {"includes": "comments.text"},
      "highlight": {"fields": {"comments.text": {"number_of_fragments": 0}}}
    }
  }
}

Now when you run a search you will find "inner_hits" field in the document, somewhere after "_source". There will be something like this, you can find index of highlighted element in _nested.offset field.

"inner_hits": {
        "comments": {
            "hits": {
                "hits": [
                    {
                        "_nested": {"field": "comments", "offset": 2},
                        "_source": {"text": "foobar"},
                        "highlight": {"comments.text": ["<em>foobar</em>"]}
                    }
                ]
            }
        }
}

It is not perfect, but I hope it will help someone.

@bhaveshpatel640
Copy link

+1

@nwaughachukwuma
Copy link

My workaround

Use a regex to remove the pre/post tags and compare against the data in _source. This takes care of all cases - highlight or not

Example

Source: { 'path.to.arrayField': [foo, foobar, bar]}
Highlight: [foo, foobar]

export function removeHighlightTags(
  text: string,
  preTag = '<em>',
  postTag = '</em>',
) {
  return text
    .replace(new RegExp(preTag, 'g'), '')
    .replace(new RegExp(postTag, 'g'), '')
    .trim()
}

// fetch data
const {hits} = await es_client.search(...)

// let's handle a single hit
const hit = hits.hits[0]
const highlights = hit.highlight.['path.to.arrayField.arrayItem']

const result = hit.arrayField.map((item) => {
    return highlights.find((h) => removeHighlightTags(h) === item.trim()) || item
})

console.log(result) // -->  [<em>foo</em>, <em>foo</em>bar, bar]

@jsphstls
Copy link

I've created a similar issue where highlighted array items could be indicated by path as an alternative to the original proposal of including non-highlighted items within highlights.

@javanna javanna added Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch and removed Team:Search Meta label for search team labels Jul 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement good first issue low hanging fruit help wanted adoptme :Search Relevance/Highlighting How a query matched a document Team:Search Relevance Meta label for the Search Relevance team in Elasticsearch
Projects
None yet
Development

No branches or pull requests