Skip to content

Commit

Permalink
Document most similar documents pipeline
Browse files Browse the repository at this point in the history
  • Loading branch information
brandenchan committed Sep 20, 2021
1 parent 98f0012 commit a21bec7
Showing 1 changed file with 31 additions and 2 deletions.
33 changes: 31 additions & 2 deletions docs/latest/components/ready_made_pipelines.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ We typically pass the output of the Retriever to another component such as the R

`DocumentSearchPipeline` wraps the [Retriever](/components/retriever) into a pipeline. Note that this wrapper does not endow the Retrievers with additional functionality but instead allows them to be used consistently with other Haystack Pipeline objects and with the same familiar syntax. Creating this pipeline is as simple as passing the Retriever into the pipeline’s constructor:

```python
``` python
pipeline = DocumentSearchPipeline(retriever=retriever)

query = "Tell me something about that time when they play chess."
Expand Down Expand Up @@ -128,7 +128,7 @@ result = pipeline.run(query=query, params={"retriever": {"top_k": 10}, "reader":

You may access the answer and other information like the model’s confidence and original context via the `answers` key, in this manner:

```python
``` python
result["answers"]
>>> [{'answer': 'der Klang der Musik',
'score': 9.269367218017578,
Expand Down Expand Up @@ -209,4 +209,33 @@ Output:
],
...
}
```

## MostSimilarDocumentsPipeline

This pipeline is used to find the most similar documents to a given document in your document store.

You will need to first make sure that your indexed documents have attached embeddings.
You can generate and store their embeddings using the `DocumentStore.update_embeddings()` method.

``` python
from haystack.pipeline import MostSimilarDocumentsPipeline

msd_pipeline = MostSimilarDocumentsPipeline(document_store)
result = msd_pipeline.run(document_ids=[doc_id1, doc_id2, ...])
print(result)
```

Output:

``` python
[[
{'text': "Southern California's economy is diver...",
'score': 0.8605178832348279,
'question': None,
'meta': {'name': 'Southern_California'},
'embedding': ...,
'id': '6e26b1b78c48efc6dd6c888e72d0970b'},
...
]]
```

0 comments on commit a21bec7

Please sign in to comment.