Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

code examples in elasticsearch-document-store.md throw errors #98

Closed
annthurium opened this issue Dec 20, 2023 · 0 comments · Fixed by #100
Closed

code examples in elasticsearch-document-store.md throw errors #98

annthurium opened this issue Dec 20, 2023 · 0 comments · Fixed by #100
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@annthurium
Copy link
Contributor

I tried to run the code in elasticsearch-document-store.md and ran into some errors. I attempted to fix them but got slightly stuck. If someone could point me in the right direction, happy to open a PR.

The topmost block of code:

document_store = ElasticsearchDocumentStore(hosts = "http:https://localhost:9200")
converter = TextFileToDocument()
splitter = DocumentSplitter()
doc_embedder = SentenceTransformersDocumentEmbedder(model_name_or_path="sentence-transformers/multi-qa-mpnet-base-dot-v1")
writer = DocumentWriter(document_store)

indexing_pipeline = Pipeline()
indexing_pipeline.add_component("converter", converter)
indexing_pipeline.add_component("splitter", splitter)
indexing_pipeline.add_component("doc_embedder", doc_embedder)
indexing_pipeline.add_component("writer", writer)

indexing_pipeline.connect("converter", "splitter")
indexing_pipeline.connect("splitter", "doc_embedder")
indexing_pipeline.connect("doc_embedder", "writer")

indexing_pipeline.run({
    "converter":{"sources":["filename.txt"]}
    })

Produces an error:

Failed to write documents to Elasticsearch. Errors:
[{'create': {'_index': 'default', '_id': '6383dc3ed51fc90c2e45704853a7ad9b14168f4f262ac7a0b65e02c465d0bb1c', 'status': 400, 'error': {'type': 'document_parsing_exception', 'reason': "[1:15833] failed to parse: The [dense_vector] field [embedding] in doc [document with id '6383dc3ed51fc90c2e45704853a7ad9b14168f4f262ac7a0b65e02c465d0bb1c'] has a different number of dimensions [768] than defined in the mapping [1024]", 'caused_by': {'type': 'illegal_argument_exception', 'reason': "The [dense_vector] field [embedding] in doc [document with id '6383dc3ed51fc90c2e45704853a7ad9b14168f4f262ac7a0b65e02c465d0bb1c'] has a different number of dimensions [768] than defined in the mapping [1024]"}}}}]'

Other than the SentenceTransformerTextEmbedder, which of these components requires us to specify a model_name_or_path? It wasn't easy to figure out from looking at the documentation or reading the Haystack source code. 🤔

The second block of code, I'm running into the same error about a mismatch in vector index lengths. There were also a few errors with param names and such that were easy to clean up:

from elasticsearch_haystack.document_store import ElasticsearchDocumentStore
from haystack.pipeline import Pipeline
from haystack.components.embedders import SentenceTransformersTextEmbedder 
from elasticsearch_haystack.embedding_retriever import ElasticsearchEmbeddingRetriever

model_name_or_path = "sentence-transformers/multi-qa-mpnet-base-dot-v1"

document_store = ElasticsearchDocumentStore(hosts = "http:https://localhost:9200")
retriever = ElasticsearchEmbeddingRetriever(document_store=document_store)
text_embedder = SentenceTransformersTextEmbedder(model_name_or_path=model_name_or_path)

query_pipeline = Pipeline()
query_pipeline.add_component("text_embedder", text_embedder)
query_pipeline.add_component("retriever", retriever)
query_pipeline.connect("text_embedder.embedding", "retriever.query_embedding")

query_pipeline.run({"text_embedder": {"text": "historical places in Instanbul"}})
@annthurium annthurium self-assigned this Dec 20, 2023
@annthurium annthurium changed the title code examples in elasticsearch-document-store.md throws errors code examples in elasticsearch-document-store.md throw errors Dec 20, 2023
@annthurium annthurium added the documentation Improvements or additions to documentation label Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
1 participant