You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
When combining a QA promptnode with retriever I get 2 errors:
only the first retrieved document is put into the prompt, the remaining retrieved documents are ignored
the naming of variables is not correct, since in my setup I only have access to the query, not questions
Error message
For 1:
the error is silent, you need to look at the debug output for the prompt (the input to the node contains alls documents, only the final prompt is missing them)
For 2:
Exception: Exception while running node 'qa_prompt_node': Expected prompt params ['documents', 'questions'] but got ['documents', 'labels', 'stop_words', 'query']
To Reproduce
importloggingfrompathlibimportPathlogging.basicConfig(format="%(levelname)s - %(name)s - %(message)s", level=logging.WARNING)
logging.getLogger("haystack").setLevel(logging.INFO)
fromhaystack.document_storesimportElasticsearchDocumentStorefromhaystack.utilsimportfetch_archive_from_http, print_answers, launch_esfromhaystack.nodesimportFARMReader, BM25Retriever, EmbeddingRetrieverfromhaystack.nodes.file_classifierimportFileTypeClassifierfromhaystack.nodes.preprocessorimportPreProcessorfromhaystack.nodes.file_converterimportTextConverterfromhaystack.pipelinesimportPipelinefromhaystack.nodes.promptimportPromptNode, PromptTemplate, PromptModelfromhaystack.pipelinesimportPipelineimportlogginglogging.basicConfig(format="%(levelname)s - %(name)s - %(message)s", level=logging.DEBUG)
logging.getLogger("haystack").setLevel(logging.DEBUG)
defbasic_qa_pipeline():
# Initialize a DocumentStoredocument_store=ElasticsearchDocumentStore(host="localhost", username="", password="", index="document")
# fetch, pre-process and write documentsdoc_dir="data/basic_qa_pipeline"s3_url="https://core-engineering.s3.eu-central-1.amazonaws.com/public/scripts/wiki_gameofthrones_txt1.zip"fetch_archive_from_http(url=s3_url, output_dir=doc_dir)
file_paths= [pforpinPath(doc_dir).glob("**/*")]
files_metadata= [{"name": path.name} forpathinfile_paths]
# Indexing Pipelineindexing_pipeline=Pipeline()
# Makes sure the file is a TXT file (FileTypeClassifier node)classifier=FileTypeClassifier()
indexing_pipeline.add_node(classifier, name="Classifier", inputs=["File"])
# Converts a file into text and performs basic cleaning (TextConverter node)text_converter=TextConverter(remove_numeric_tables=True)
indexing_pipeline.add_node(text_converter, name="Text_converter", inputs=["Classifier.output_1"])
# - Pre-processes the text by performing splits and adding metadata to the text (Preprocessor node)preprocessor=PreProcessor(
clean_whitespace=True,
clean_empty_lines=True,
split_length=150,
split_overlap=10,
split_respect_sentence_boundary=True,
)
indexing_pipeline.add_node(preprocessor, name="Preprocessor", inputs=["Text_converter"])
# - Writes the resulting documents into the document storeindexing_pipeline.add_node(document_store, name="Document_Store", inputs=["Preprocessor"])
# Then we run it with the documents and their metadata as input# Initialize Retriever & Reader# retriever = BM25Retriever(document_store=document_store)retriever=EmbeddingRetriever(
document_store=document_store, embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1"
)
indexing_pipeline.run(file_paths=file_paths, meta=files_metadata)
document_store.update_embeddings(retriever)
# Now, we're defining the shared model for each PromptNode:prompt_open_ai=PromptModel(
model_name_or_path="text-davinci-003", api_key="api KEY"
)
# Let's initialize the PromptNodes. Notice that they share the prompt model.# This initalizes the PromptNode that's going to generate the questions:# And this initializes the PromptNode that's going to answer the questions:qa_prompt_node=PromptNode(
prompt_open_ai, default_prompt_template="question-answering"
) # "question-answering_citing"# Time to define the pipeline:pipeline=Pipeline()
pipeline.add_node(component=retriever, name="retriever", inputs=["Query"])
pipeline.add_node(component=qa_prompt_node, name="qa_prompt_node", inputs=["retriever"])
output=pipeline.run(query="Who is the father of Arya Stark?", params={"retriever": {"top_k": 10}})
print(output["results"])
muh=1if__name__=="__main__":
launch_es(delete_existing=False)
basic_qa_pipeline()
The text was updated successfully, but these errors were encountered:
Describe the bug
When combining a QA promptnode with retriever I get 2 errors:
query
, notquestions
Error message
For 1:
the error is silent, you need to look at the debug output for the prompt (the input to the node contains alls documents, only the final prompt is missing them)
For 2:
Exception: Exception while running node 'qa_prompt_node': Expected prompt params ['documents', 'questions'] but got ['documents', 'labels', 'stop_words', 'query']
To Reproduce
The text was updated successfully, but these errors were encountered: