Hybrid doc search e2e fails with DuplicateDocumentError #7788

julian-risch · 2024-06-03T06:37:26Z

Here is the test run https://github.com/deepset-ai/haystack/actions/runs/9342231146/job/25710027363
which fails with a DuplicteDocumentError.

We are calling write_documents twice, writing documents with the same content once without embeddings and once with embeddings:

hybrid_pipeline.get_component("bm25_retriever").document_store.write_documents(documents)
        doc_embedder = SentenceTransformersDocumentEmbedder(model="sentence-transformers/all-MiniLM-L6-v2")
        doc_embedder.warm_up()
        embedded_documents = doc_embedder.run(documents=documents)["documents"]
>       hybrid_pipeline.get_component("embedding_retriever").document_store.write_documents(embedded_documents)

The error occurs since we merged feat: Add memory sharing between different instances of InMemoryDocumentStore

The text was updated successfully, but these errors were encountered:

silvanocerza mentioned this issue Jun 3, 2024

fix: InMemoryDocumentStore not sharing some document stats with other instances #7792

Merged

silvanocerza self-assigned this Jun 3, 2024

silvanocerza closed this as completed in #7792 Jun 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hybrid doc search e2e fails with DuplicateDocumentError #7788

Hybrid doc search e2e fails with DuplicateDocumentError #7788

julian-risch commented Jun 3, 2024

Hybrid doc search e2e fails with DuplicateDocumentError #7788

Hybrid doc search e2e fails with DuplicateDocumentError #7788

Comments

julian-risch commented Jun 3, 2024