PromptNode errors for Retriever + PromptNode with QA template #4047

Timoeller · 2023-02-02T18:02:57Z

Describe the bug
When combining a QA promptnode with retriever I get 2 errors:

only the first retrieved document is put into the prompt, the remaining retrieved documents are ignored
the naming of variables is not correct, since in my setup I only have access to the query, not questions

Error message
For 1:
the error is silent, you need to look at the debug output for the prompt (the input to the node contains alls documents, only the final prompt is missing them)
For 2:
Exception: Exception while running node 'qa_prompt_node': Expected prompt params ['documents', 'questions'] but got ['documents', 'labels', 'stop_words', 'query']

To Reproduce

import logging
from pathlib import Path

logging.basicConfig(format="%(levelname)s - %(name)s -  %(message)s", level=logging.WARNING)
logging.getLogger("haystack").setLevel(logging.INFO)

from haystack.document_stores import ElasticsearchDocumentStore
from haystack.utils import fetch_archive_from_http, print_answers, launch_es
from haystack.nodes import FARMReader, BM25Retriever, EmbeddingRetriever
from haystack.nodes.file_classifier import FileTypeClassifier
from haystack.nodes.preprocessor import PreProcessor
from haystack.nodes.file_converter import TextConverter
from haystack.pipelines import Pipeline
from haystack.nodes.prompt import PromptNode, PromptTemplate, PromptModel
from haystack.pipelines import Pipeline

import logging

logging.basicConfig(format="%(levelname)s - %(name)s -  %(message)s", level=logging.DEBUG)
logging.getLogger("haystack").setLevel(logging.DEBUG)


def basic_qa_pipeline():
    # Initialize a DocumentStore
    document_store = ElasticsearchDocumentStore(host="localhost", username="", password="", index="document")

    # fetch, pre-process and write documents
    doc_dir = "data/basic_qa_pipeline"
    s3_url = "https://core-engineering.s3.eu-central-1.amazonaws.com/public/scripts/wiki_gameofthrones_txt1.zip"
    fetch_archive_from_http(url=s3_url, output_dir=doc_dir)

    file_paths = [p for p in Path(doc_dir).glob("**/*")]
    files_metadata = [{"name": path.name} for path in file_paths]

    # Indexing Pipeline
    indexing_pipeline = Pipeline()

    # Makes sure the file is a TXT file (FileTypeClassifier node)
    classifier = FileTypeClassifier()
    indexing_pipeline.add_node(classifier, name="Classifier", inputs=["File"])

    # Converts a file into text and performs basic cleaning (TextConverter node)
    text_converter = TextConverter(remove_numeric_tables=True)
    indexing_pipeline.add_node(text_converter, name="Text_converter", inputs=["Classifier.output_1"])

    # - Pre-processes the text by performing splits and adding metadata to the text (Preprocessor node)
    preprocessor = PreProcessor(
        clean_whitespace=True,
        clean_empty_lines=True,
        split_length=150,
        split_overlap=10,
        split_respect_sentence_boundary=True,
    )
    indexing_pipeline.add_node(preprocessor, name="Preprocessor", inputs=["Text_converter"])

    # - Writes the resulting documents into the document store
    indexing_pipeline.add_node(document_store, name="Document_Store", inputs=["Preprocessor"])

    # Then we run it with the documents and their metadata as input

    # Initialize Retriever & Reader
    # retriever = BM25Retriever(document_store=document_store)
    retriever = EmbeddingRetriever(
        document_store=document_store, embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1"
    )

    indexing_pipeline.run(file_paths=file_paths, meta=files_metadata)
    document_store.update_embeddings(retriever)

    # Now, we're defining the shared model for each PromptNode:
    prompt_open_ai = PromptModel(
        model_name_or_path="text-davinci-003", api_key="api KEY"
    )

    # Let's initialize the PromptNodes. Notice that they share the prompt model.
    # This initalizes the PromptNode that's going to generate the questions:

    # And this initializes the PromptNode that's going to answer the questions:
    qa_prompt_node = PromptNode(
        prompt_open_ai, default_prompt_template="question-answering"
    )  # "question-answering_citing"

    # Time to define the pipeline:
    pipeline = Pipeline()
    pipeline.add_node(component=retriever, name="retriever", inputs=["Query"])
    pipeline.add_node(component=qa_prompt_node, name="qa_prompt_node", inputs=["retriever"])
    output = pipeline.run(query="Who is the father of Arya Stark?", params={"retriever": {"top_k": 10}})
    print(output["results"])

    muh = 1


if __name__ == "__main__":
    launch_es(delete_existing=False)
    basic_qa_pipeline()

The text was updated successfully, but these errors were encountered:

bogdankostic added type:bug Something isn't working topic:LLM labels Feb 2, 2023

tstadel mentioned this issue Feb 3, 2023

spike: use Shapers inside PromptTemplate #4061

Closed

6 tasks

masci assigned vblagoje Feb 8, 2023

tstadel mentioned this issue Mar 27, 2023

feat: PromptTemplate extensions #4378

Merged

6 tasks

tstadel closed this as completed in #4378 Mar 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PromptNode errors for Retriever + PromptNode with QA template #4047

PromptNode errors for Retriever + PromptNode with QA template #4047

Timoeller commented Feb 2, 2023 •

edited

Loading

PromptNode errors for Retriever + PromptNode with QA template #4047

PromptNode errors for Retriever + PromptNode with QA template #4047

Comments

Timoeller commented Feb 2, 2023 • edited Loading

Timoeller commented Feb 2, 2023 •

edited

Loading