Use separate analyzer on the search side for ngram-based search. #1711

xluandc · 2020-02-11T20:36:22Z

The n-gram search on texts were commented out, most likely because it negatively impacts the search relevance. Upon a closer look, there seem to be two main issues:
(1) The same n-gram analyzer was used for both indexing and search. Typically the search side needs a different analyzer that does not use the n-gram based token filters.
(2) The general n-gram token filter was used, which would include n-grams inside a word and thus may negatively impact the relevance.
This pull request is meant to fix the above two issues and re-enable the n-gram based search for texts at the word boundary (as opposed to at the beginning of the whole text only).
The changes in this pull request have been deployed/used for months at the Lister Hill Center of the National Library of Medicine and the results seem to be positive so far.

Use separate analyzer on the search side for ngram-based search.

479fcdc

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use separate analyzer on the search side for ngram-based search. #1711

Use separate analyzer on the search side for ngram-based search. #1711

xluandc commented Feb 11, 2020

Use separate analyzer on the search side for ngram-based search. #1711

Are you sure you want to change the base?

Use separate analyzer on the search side for ngram-based search. #1711

Conversation

xluandc commented Feb 11, 2020