Skip to content
This repository has been archived by the owner on Oct 20, 2022. It is now read-only.

New docs version #159

Merged
merged 9 commits into from
Sep 23, 2021
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
dorpdown for pros and cons documentstore
  • Loading branch information
PiffPaffM committed Sep 21, 2021
commit cad755f45c8ccbb76c6562b3b736a5b4ef4ac0c7
199 changes: 111 additions & 88 deletions docs/latest/components/document_store.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -194,94 +194,117 @@ Having GPU acceleration will significantly speed this up.

The Document Stores have different characteristics. You should choose one depending on the maturity of your project, the use case and technical environment:

### Elasticsearch

**Pros:**

- Fast & accurate sparse retrieval with many tuning options
- Basic support for dense retrieval
- Production-ready
- Support also for Open Distro

**Cons:**

- Slow for dense retrieval with more than ~ 1 Mio documents

<div style={{ marginBottom: "3rem" }} />

### Milvus

**Pros:**

- Scalable DocumentStore that excels at handling vectors (hence suited to dense retrieval methods like DPR)
- Encapsulates multiple ANN libraries (e.g. FAISS and ANNOY) and provides added reliability
- Runs as a separate service (e.g. a Docker container)
- Allows dynamic data management

**Cons:**

- No efficient sparse retrieval

<div style={{ marginBottom: "3rem" }} />

### FAISS

**Pros:**

- Fast & accurate dense retrieval
- Highly scalable due to approximate nearest neighbour algorithms (ANN)
- Many options to tune dense retrieval via different index types (more info [here](https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index))

**Cons:**

- No efficient sparse retrieval

<div style={{ marginBottom: "3rem" }} />

### In Memory

**Pros:**

- Simple
- Exists already in many environments

**Cons:**

- Only compatible with minimal TF-IDF Retriever
- Bad retrieval performance
- Not recommended for production

### SQL

<div style={{ marginBottom: "3rem" }} />

**Pros:**

- Simple & fast to test
- No database requirements
- Supports MySQL, PostgreSQL and SQLite

**Cons:**

- Not scalable
- Not persisting your data on disk

<div style={{ marginBottom: "3rem" }} />

### Weaviate

**Pros:**

- Simple vector search
- Stores everything in one place: documents, meta data and vectors - so less network overhead when scaling this up
- Allows combination of vector search and scalar filtering, i.e. you can filter for a certain tag and do dense retrieval on that subset

**Cons:**

- Less options for ANN algorithms than FAISS or Milvus
- No BM25 / Tf-idf retrieval

<div style={{ marginBottom: "3rem" }} />
<Disclosures
options={[
{
title: "Elasticsearch",
content: (
<div>
<strong>Pros:</strong>
<ul>
<li>Fast & accurate sparse retrieval with many tuning options</li>
<li>Basic support for dense retrieval</li>
<li>Production-ready</li>
<li>Support also for Open Distro</li>
</ul>
<strong>Cons:</strong>
<ul>
<li>Slow for dense retrieval with more than ~ 1 Mio documents</li>
</ul>
</div>
)
},
{
title: "Milvus",
content: (
<div>
<strong>Pros:</strong>
<ul>
<li>Scalable DocumentStore that excels at handling vectors (hence suited to dense retrieval methods like DPR)</li>
<li>Encapsulates multiple ANN libraries (e.g. FAISS and ANNOY) and provides added reliability</li>
<li>Runs as a separate service (e.g. a Docker container)</li>
<li>Allows dynamic data management</li>
</ul>
<strong>Cons:</strong>
<ul>
<li>No efficient sparse retrieval</li>
</ul>
</div>
)
},
{
title: "FAISS",
content: (
<div>
<strong>Pros:</strong>
<ul>
<li>Fast & accurate dense retrieval</li>
<li>Highly scalable due to approximate nearest neighbour algorithms (ANN)</li>
<li>Many options to tune dense retrieval via different index types (more info [here](https://github.com/facebookresearch/faiss/wiki/Guidelines-to-choose-an-index))</li>
</ul>
<strong>Cons:</strong>
<ul>
<li>No efficient sparse retrieval</li>
</ul>
</div>
)
},
{
title: "In Memory",
content: (
<div>
<strong>Pros:</strong>
<ul>
<li>Simple</li>
<li>Exists already in many environments</li>
</ul>
<strong>Cons:</strong>
<ul>
<li>Only compatible with minimal TF-IDF Retriever</li>
<li>Bad retrieval performance</li>
<li>Not recommended for production</li>
</ul>
</div>
)
},
{
title: "SQL",
content: (
<div>
<strong>Pros:</strong>
<ul>
<li>Simple & fast to test</li>
<li>No database requirements</li>
<li>Supports MySQL, PostgreSQL and SQLite</li>
</ul>
<strong>Cons:</strong>
<ul>
<li>Not scalable</li>
<li>Not persisting your data on disk</li>
</ul>
</div>
)
},
{
title: "Weaviate",
content: (
<div>
<strong>Pros:</strong>
<ul>
<li>Simple vector search</li>
<li>Stores everything in one place: documents, meta data and vectors - so less network overhead when scaling this up</li>
<li>Allows combination of vector search and scalar filtering, i.e. you can filter for a certain tag and do dense retrieval on that subset</li>
</ul>
<strong>Cons:</strong>
<ul>
<li>Less options for ANN algorithms than FAISS or Milvus</li>
<li>No BM25 / Tf-idf retrieval</li>
</ul>
</div>
)
}
]}
/>

<div className="max-w-xl bg-yellow-light-theme border-l-8 border-yellow-dark-theme px-6 pt-6 pb-4 my-4 rounded-md dark:bg-yellow-900">

Expand Down