-
Notifications
You must be signed in to change notification settings - Fork 24.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Common Grams filter should have configuration option #36771
Comments
Pinging @elastic/es-search |
@Aezo I had a short discussion about this request with another team member and we were wondering why the current behaviour doesn't work for you. For example in the "Samsung 64 GB Gold" case, this would only create one more token ("gb gold") which should be quite rare and shouldn't really result in any loss in precision. The way we currently see this feature request is that it would adds some complexity without much benefit. If you could explain your pain points with the current way the filter works, this might change our understanding of the problem. |
@cbuescher It doesn't work for me because, with If search query is "Samsung Galaxy A6 (Gold)", the tokens generated would be - This is a problem because, let's say I have two phones in my documents, one "Samsung Galaxy A6 64 GB (Red)" and second "Samsung Galaxy A6 64 GB (Gold)", I would like to show the Gold one on top. Yes, disabling |
Just to clarify, why would you be using a different search time analyzer than the index time analyzer here? |
What made you conclude that I'm using a different search analyzer and index analyzer? I'll be using same analyzer for both index time and search time. |
This should also generate |
Sorry for that, I misread your comment and got confused by the fact that you stated you are loosing the "gold" token. As @romseygeek mentioned, using the current
Both should contain the "gold" token, the
This returns the "Samsung Galaxy A6 64 GB (Gold)" document first. |
@cbuescher I'm sorry for not being clear. I've edited my original comment to remove that confusion. @romseygeek @cbuescher
|
Under which circumstances is the |
When you don't want the document to even match when based solely on the Another choice would be to use shingles, but that's very limiting. |
If someone searches for "32 GB", it will pass through the search analyzer (and hence use |
In that case, I'll have to use That means, if my doc contains "Samsung Galaxy A6 64 GB (Gold)", tokens would be So if someone searches for "Nintendo 64", the above doc will also get matched, which isn't right. "64" and "GB" make sense only together, you wouldn't want to create separate tokens for them even at index time. |
You can also use a @romseygeek @cbuescher after reading the documentation of the |
Yeah, so there are 2 choices. But only advantage of common_grams over pattern replace is the ability to give word list via a file, if later I need to add more words. |
So are we developing this feature? |
The goal of the
I think this is the gist of what you're trying to achieve and deserves a specific filter/solution.
You don't need a file to update an analyzer. |
This has been open for quite a while with no actiivity, and hasn't had a lot of interest. For now I'm going to close this as something we aren't planning on implementing. We can re-open it later if needed. |
Describe the feature: Common grams token filter should have a configuration option to specify whether the words should be combined with left token or right token or both tokens. And query_mode, if true, will then only remove the joined tokens and not touch the other tokens.
The configuration option can be named "join_mode". Configuration can be given as such -
This is how the above 3 analysers will work -
join_mode: left
input string - "Samsung 64 GB Gold"
common_grams_left_analyser - "samsung", "64_gb", "gold"
join_mode: right
input string - "salt Rs 200"
common_grams_left_analyser - "salt", "rs_200"
join_mode: both (default)
This is the current behaviour. So it should be taken as the default value of
join_mode
.input string - "fox is brown"
common_grams_left_analyser - "fox_is", "is_brown"
Provide logs (if relevant):
The text was updated successfully, but these errors were encountered: