Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve default text analysis settings #669

Merged
merged 1 commit into from
Sep 21, 2020
Merged

Conversation

cmark
Copy link
Member

@cmark cmark commented Sep 21, 2020

This PR adds the following feature to the index analysis settings:

  • Set split_on_numerics to false by default on word_splitter token filter, which is used by default to tokenize descriptions of concepts, code terms, any text that requires tokenized analyzer search. This will keep words like 2D, 3D, 4D as single terms in the index as opposed to two terms a 2 and D, so search for 2D won't show up any result unless you have used term prefix matching.
  • Add the keyword as available analyzer for searches where the search term should be kept as is

Also, made some minor API improvements.

@cmark cmark self-assigned this Sep 21, 2020
@cmark cmark merged commit ad458e7 into 7.x Sep 21, 2020
@cmark cmark deleted the issue/analyzer-improvements branch September 21, 2020 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant