Improve default text analysis settings #669

cmark · 2020-09-21T17:46:33Z

This PR adds the following feature to the index analysis settings:

Set split_on_numerics to false by default on word_splitter token filter, which is used by default to tokenize descriptions of concepts, code terms, any text that requires tokenized analyzer search. This will keep words like 2D, 3D, 4D as single terms in the index as opposed to two terms a 2 and D, so search for 2D won't show up any result unless you have used term prefix matching.
Add the keyword as available analyzer for searches where the search term should be kept as is

Also, made some minor API improvements.

[index] commit progress on analyzer improvements

3aeec09

cmark self-assigned this Sep 21, 2020

cmark added the enhancement label Sep 21, 2020

cmark merged commit ad458e7 into 7.x Sep 21, 2020

cmark deleted the issue/analyzer-improvements branch September 21, 2020 17:48

Provide feedback