Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable stemming and choosing tokenizer, when doing full text search in tantivy #1315

Open
josca42 opened this issue May 19, 2024 · 1 comment · May be fixed by #1356
Open

Enable stemming and choosing tokenizer, when doing full text search in tantivy #1315

josca42 opened this issue May 19, 2024 · 1 comment · May be fixed by #1356
Labels
enhancement New feature or request

Comments

@josca42
Copy link

josca42 commented May 19, 2024

SDK

Python

Description

Enabling stemming and using a language specific tokenizer tend to improve recall quite a bit, when doing full text search.

Tantivy has support for this through the tokenizer_name argument in add_text_field.

As far as I can tell the change needed is to add tokenizer_name argument to the following line

And then add the tokenizer_name argument to the create_fts_index method.

I would personally really prefer if the argument could be exposed instead of just enabling the usage of the english stemmer. Tantivy supports a few different language tokenizers, which I think a lot of people would like to use instead of english

I can create a pull request with the suggested changes if you think it is a good idea :-).

@josca42 josca42 added the enhancement New feature or request label May 19, 2024
@wjones127
Copy link
Contributor

This all sounds good to me. Feel free to make a PR :)

@josca42 josca42 linked a pull request Jun 5, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants