-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accept Callables as Tokenizers for InMemoryDocumentStore #4720
Comments
How do we plan to accept callables? By defining |
If we rename If we introduce a new param it'll muddle up things and we'd need to ensure that both of them don't get used simultaneously. |
Will try to help with this one :) |
Discussed in #4695
Originally posted by farhanhubble April 18, 2023
InMemoryDocumentStore
currently only accepts a tokenizing pattern through the argumentbm25_tokenization_regex: str = r"(?u)\b\w\w+\b"
. The underlying BM25 supports acallable
though. Removing this restriction will enable correct tokenization of a larger variety of corpora. I ran into this limitation trying to index JSON documents that contain key-value pairs, like:The text was updated successfully, but these errors were encountered: