You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Our Recommendations
In the past, a pure Vector DB was also proposed (this was Milvus).
After the deprecation of Milvus, we are suggesting Elasticsearch as an allrounder solution (while mentioning that it can be "slow for dense retrieval with more than ~ 1 Mio documents").
It would probably also be better to propose a Vector specialist (Weaviate? Qdrant?).
FAISSDocumentStore?
(Somehow related to the previous point).
I often see people using FAISSDocumentStore, which seems to me a thin and imperfect implementation built on FAISS.
These users often encounter problems.
I would suggest not promoting FAISSDocumentStore much, whereas there are more powerful, well-designed and integrated vector DBs today.
DeepsetCloud appears in the table, but not in the Document Stores page
I would add Qdrant here, even if it is an external integration
DPR
Embedding Retrieval is recommended but DPR is still described as "a highly performant retrieval method".
You know the NLP domain better than I do, but my impression is that most of the avalaible DPR models performe worse than Sentence Transformers models, especially for out-of-domain retrieval (see also BEIR paper).
Therefore, I would probably include a less positive description of the DPR.
TF-IDF
Perhaps we should add a little hint that the BM25 is generally better...
In general, I would try to make BM25 and EmbeddingRetriever more prominent and visible, moving everything else further down the page (Multihop, Table retrieval and Multimodal retrieval).
@dfokina feel free to discard my opinions if they do not make sense, and to involve other people in the discussion as well!
😃
The text was updated successfully, but these errors were encountered:
Hi @anakin87 , thank you for the recommendations, I implemented changes in both docs :)
Just a couple of comments:
Decided to not recommend a vector specialist at the moment, as it varies for different use cases, so we are not yet comfortable pointing any specific one in the docs.
I erased deepsetCloudDocumentStore from the DocumentStore Compatibility table – it is not intended for production use anyway and is specific to deepsetCloud users.
Haystack docs on Document Stores and Retrievers are extremely important for those new to the framework, but I feel they could improved.
I will try to list some personal opinions that have come to mind, also based on my interactions with the community.
Document Stores page
Approximate Nearest Neighbors Search
The list of DBs supporting ANN is probably out of date.
Choosing the Right Document Store
OpenDistroElasticsearchDocumentStore
is still present, but has been removed in chore!: remove deprecated OpenDistroElasticsearchDocumentStore #4361In general, this table should be reviewed/updated carefully.
Our Recommendations
In the past, a pure Vector DB was also proposed (this was Milvus).
After the deprecation of Milvus, we are suggesting Elasticsearch as an allrounder solution (while mentioning that it can be "slow for dense retrieval with more than ~ 1 Mio documents").
It would probably also be better to propose a Vector specialist (Weaviate? Qdrant?).
FAISSDocumentStore?
(Somehow related to the previous point).
I often see people using
FAISSDocumentStore
, which seems to me a thin and imperfect implementation built on FAISS.These users often encounter problems.
I would suggest not promoting
FAISSDocumentStore
much, whereas there are more powerful, well-designed and integrated vector DBs today.Retrievers page
DocumentStore Compatibility table
DPR
Embedding Retrieval is recommended but DPR is still described as "a highly performant retrieval method".
You know the NLP domain better than I do, but my impression is that most of the avalaible DPR models performe worse than Sentence Transformers models, especially for out-of-domain retrieval (see also BEIR paper).
Therefore, I would probably include a less positive description of the DPR.
TF-IDF
Perhaps we should add a little hint that the BM25 is generally better...
In general, I would try to make BM25 and EmbeddingRetriever more prominent and visible, moving everything else further down the page (Multihop, Table retrieval and Multimodal retrieval).
@dfokina feel free to discard my opinions if they do not make sense, and to involve other people in the discussion as well!
😃
The text was updated successfully, but these errors were encountered: