Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: fuzziness above 2 edits #73083

Closed
jguay opened this issue May 14, 2021 · 4 comments
Closed

Enhancement: fuzziness above 2 edits #73083

jguay opened this issue May 14, 2021 · 4 comments
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team

Comments

@jguay
Copy link
Contributor

jguay commented May 14, 2021

Currently elasticsearch support fuzziness of up to two edits per documentation

Requested feature is for fuzziness of 3 (or more) edits

Example with ALEJANDRO is indexed but returned when ALEXANDER or ALEJATYPO is searched with fuzziness of 3 Edits.

Side notes :

  • Synonyms which would be a valid/cheaper solution in some cases which is outside the scope of the feature request
  • With async search elasticsearch currently supports better running long queries. Fuzziness of 3 will obviously be much more costly than 2 edits (potentially doc update needed if the feature for 3 or more max edits is implemented)
@jguay jguay added >enhancement :Search/Search Search-related issues that do not fall into other categories needs:triage Requires assignment of a team area label labels May 14, 2021
@elasticmachine elasticmachine added the Team:Search Meta label for search team label May 14, 2021
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search (Team:Search)

@markharwood
Copy link
Contributor

I wasn't close to the development of the fuzzy implementation but it was notoriously complex getting beyond just one edit distance. That suggests it's not as simple as raising a setting.
It's worth mentioning that a common and perhaps simpler solution to fuzzy matching is to index with ngrams and search those.
Or did you mean that for edit distance > 2 we should only use async search and revert to a slower brute force scan of indexed terms (using Levenshtein string comparisons) rather than adding to the complex automaton-based matching exclusively used today?

@jguay
Copy link
Contributor Author

jguay commented May 14, 2021

thank you @markharwood for confirming this is a difficult feature to implement, the ngram solution is in fact a common good solution although in this case it isn't suitable because mostly it will cause a lot of false positive and the requirement in this case is to always use 3 Edits for the search
Example ALEJANDRO is indexed with "min_gram" : 3 will not match ALZJAZDRZ with 3 Edits which should match and will match ALEZZZZZZ which should not match as this is 6 Edits

I strike that side note on async search, with the expected exponential costs of adding the 3rd Edits, I just wanted to highlight async search might help because the search would run slower if we ever implement max Edit of 3 (when compared to current 2 Edits)

@DJRickyB DJRickyB removed the needs:triage Requires assignment of a team area label label May 26, 2021
@javanna
Copy link
Member

javanna commented Jun 24, 2024

This has been open for quite a while, and hasn't had a lot of interest. For now I'm going to close this as something we aren't planning on implementing. We can re-open it later if needed.

@javanna javanna closed this as not planned Won't fix, can't repro, duplicate, stale Jun 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories Team:Search Meta label for search team
Projects
None yet
Development

No branches or pull requests

5 participants