Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use separate analyzer on the search side for ngram-based search. #1711

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

xluandc
Copy link

@xluandc xluandc commented Feb 11, 2020

The n-gram search on texts were commented out, most likely because it negatively impacts the search relevance. Upon a closer look, there seem to be two main issues:
(1) The same n-gram analyzer was used for both indexing and search. Typically the search side needs a different analyzer that does not use the n-gram based token filters.
(2) The general n-gram token filter was used, which would include n-grams inside a word and thus may negatively impact the relevance.
This pull request is meant to fix the above two issues and re-enable the n-gram based search for texts at the word boundary (as opposed to at the beginning of the whole text only).
The changes in this pull request have been deployed/used for months at the Lister Hill Center of the National Library of Medicine and the results seem to be positive so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants