Skip to content

Commit

Permalink
update milvus readme with milvus lite
Browse files Browse the repository at this point in the history
Signed-off-by: ChengZi <[email protected]>
  • Loading branch information
zc277584121 committed May 30, 2024
1 parent bd82ed2 commit 70cf166
Showing 1 changed file with 59 additions and 26 deletions.
85 changes: 59 additions & 26 deletions integrations/milvus-document-store.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ toc: true
- [Haystack 2.0](#haystack-20)
- [Installation](#installation)
- [Usage](#usage)
- [Dive deep usage](#dive-deep-usage)
- [Haystack 1.x](#haystack-1x)
- [Installation (1.x)](#installation-1x)
- [Usage (1.x)](#usage-1x)
Expand All @@ -33,60 +34,92 @@ toc: true
---

### Installation
```console
pip install -U milvus-haystack

```shell
pip install --upgrade pymilvus milvus-haystack
```

### Usage

First, to start up a Milvus service, follow the ['Start Milvus'](https://milvus.io/docs/install_standalone-docker.md#Start-Milvus) instructions in the documentation.
Use the `MilvusDocumentStore` in a Haystack pipeline as a quick start.

```python
from haystack import Document
from milvus_haystack import MilvusDocumentStore

document_store = MilvusDocumentStore(
connection_args={"uri": "./milvus.db"}, # Milvus Lite
# connection_args={"uri": "http:https://localhost:19530"}, # Milvus standalone docker service.
drop_old=True,
)
documents = [Document(
content="A Foo Document",
meta={"page": "100", "chapter": "intro"},
embedding=[-10.0] * 128,
)]
document_store.write_documents(documents)
print(document_store.count_documents()) # 1
```
In the `connection_args`, setting the URI as a local file, e.g.`./milvus.db`, is the most convenient method, as it automatically utilizes [Milvus Lite](https://milvus.io/docs/milvus_lite.md) to store all data in this file.

If you have large scale of data such as more than a million docs, we recommend setting up a more performant Milvus server on [docker or kubernetes](https://milvus.io/docs/quickstart.md). When using this setup, please use the server URI, e.g.`http:https://localhost:19530`, as your URI.

### Dive deep usage

Prepare an OpenAI API key and set it as an environment variable:

Then, here are the ways to build index, retrieval, and build rag pipeline respectively.
```shell
export OPENAI_API_KEY=<your_api_key>
```

Here are the ways to

- Create the indexing Pipeline
- Create the retrieval pipeline
- Create the RAG pipeline

```py
# Create the indexing Pipeline and index some documents
import glob
#### Create the indexing Pipeline and index some documents

```python
import os

from haystack import Pipeline
from haystack.components.converters import MarkdownToDocument
from haystack.components.embedders import SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder
from haystack.components.embedders import OpenAIDocumentEmbedder, OpenAITextEmbedder
from haystack.components.preprocessors import DocumentSplitter
from haystack.components.writers import DocumentWriter

from milvus_haystack import MilvusDocumentStore
from milvus_haystack.milvus_embedding_retriever import MilvusEmbeddingRetriever

file_paths = glob.glob("./milvus-document-store.md")
current_file_path = os.path.abspath(__file__)
file_paths = [current_file_path] # You can replace it with your own file paths.

document_store = MilvusDocumentStore(
connection_args={
"host": "localhost",
"port": "19530",
"user": "",
"password": "",
"secure": False,
},
connection_args={"uri": "./milvus.db"}, # Milvus Lite
# connection_args={"uri": "http:https://localhost:19530"}, # Milvus standalone docker service.
drop_old=True,
)
indexing_pipeline = Pipeline()
indexing_pipeline.add_component("converter", MarkdownToDocument())
indexing_pipeline.add_component("splitter", DocumentSplitter(split_by="sentence", split_length=2))
indexing_pipeline.add_component("embedder", SentenceTransformersDocumentEmbedder())
indexing_pipeline.add_component("embedder", OpenAIDocumentEmbedder())
indexing_pipeline.add_component("writer", DocumentWriter(document_store))
indexing_pipeline.connect("converter", "splitter")
indexing_pipeline.connect("splitter", "embedder")
indexing_pipeline.connect("embedder", "writer")
indexing_pipeline.run({"converter": {"sources": file_paths}})

print("Number of documents:", document_store.count_documents())
```

# ------------------------------------------------------------------------------------
# Create the retrieval pipeline and try a query
question = "What is Milvus?"
#### Create the retrieval pipeline and try a query

```python
question = "How to set the service uri with milvus lite?" # You can replace it with your own question.

retrieval_pipeline = Pipeline()
retrieval_pipeline.add_component("embedder", SentenceTransformersTextEmbedder())
retrieval_pipeline.add_component("embedder", OpenAITextEmbedder())
retrieval_pipeline.add_component("retriever", MilvusEmbeddingRetriever(document_store=document_store, top_k=3))
retrieval_pipeline.connect("embedder", "retriever")

Expand All @@ -95,11 +128,12 @@ retrieval_results = retrieval_pipeline.run({"embedder": {"text": question}})
for doc in retrieval_results["retriever"]["documents"]:
print(doc.content)
print("-" * 10)
```

# ------------------------------------------------------------------------------------
# Create the RAG pipeline and try a query
#### Create the RAG pipeline and try a query

```python
from haystack.utils import Secret
from haystack.components.embedders import SentenceTransformersTextEmbedder
from haystack.components.builders import PromptBuilder
from haystack.components.generators import OpenAIGenerator

Expand All @@ -114,7 +148,7 @@ prompt_template = """Answer the following query based on the provided context. I
"""

rag_pipeline = Pipeline()
rag_pipeline.add_component("text_embedder", SentenceTransformersTextEmbedder())
rag_pipeline.add_component("text_embedder", OpenAITextEmbedder())
rag_pipeline.add_component("retriever", MilvusEmbeddingRetriever(document_store=document_store, top_k=3))
rag_pipeline.add_component("prompt_builder", PromptBuilder(template=prompt_template))
rag_pipeline.add_component("generator", OpenAIGenerator(api_key=Secret.from_token(os.getenv("OPENAI_API_KEY")),
Expand All @@ -134,7 +168,6 @@ print('RAG answer:', results["generator"]["replies"][0])
```



## Haystack 1.x


Expand Down

0 comments on commit 70cf166

Please sign in to comment.