Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PromptNode errors for Retriever + PromptNode with QA template #4047

Closed
Timoeller opened this issue Feb 2, 2023 · 0 comments · Fixed by #4378
Closed

PromptNode errors for Retriever + PromptNode with QA template #4047

Timoeller opened this issue Feb 2, 2023 · 0 comments · Fixed by #4378
Assignees
Labels
topic:LLM type:bug Something isn't working

Comments

@Timoeller
Copy link
Contributor

Timoeller commented Feb 2, 2023

Describe the bug
When combining a QA promptnode with retriever I get 2 errors:

  1. only the first retrieved document is put into the prompt, the remaining retrieved documents are ignored
  2. the naming of variables is not correct, since in my setup I only have access to the query, not questions

Error message
For 1:
the error is silent, you need to look at the debug output for the prompt (the input to the node contains alls documents, only the final prompt is missing them)
For 2:
Exception: Exception while running node 'qa_prompt_node': Expected prompt params ['documents', 'questions'] but got ['documents', 'labels', 'stop_words', 'query']

To Reproduce

import logging
from pathlib import Path

logging.basicConfig(format="%(levelname)s - %(name)s -  %(message)s", level=logging.WARNING)
logging.getLogger("haystack").setLevel(logging.INFO)

from haystack.document_stores import ElasticsearchDocumentStore
from haystack.utils import fetch_archive_from_http, print_answers, launch_es
from haystack.nodes import FARMReader, BM25Retriever, EmbeddingRetriever
from haystack.nodes.file_classifier import FileTypeClassifier
from haystack.nodes.preprocessor import PreProcessor
from haystack.nodes.file_converter import TextConverter
from haystack.pipelines import Pipeline
from haystack.nodes.prompt import PromptNode, PromptTemplate, PromptModel
from haystack.pipelines import Pipeline

import logging

logging.basicConfig(format="%(levelname)s - %(name)s -  %(message)s", level=logging.DEBUG)
logging.getLogger("haystack").setLevel(logging.DEBUG)


def basic_qa_pipeline():
    # Initialize a DocumentStore
    document_store = ElasticsearchDocumentStore(host="localhost", username="", password="", index="document")

    # fetch, pre-process and write documents
    doc_dir = "data/basic_qa_pipeline"
    s3_url = "https://core-engineering.s3.eu-central-1.amazonaws.com/public/scripts/wiki_gameofthrones_txt1.zip"
    fetch_archive_from_http(url=s3_url, output_dir=doc_dir)

    file_paths = [p for p in Path(doc_dir).glob("**/*")]
    files_metadata = [{"name": path.name} for path in file_paths]

    # Indexing Pipeline
    indexing_pipeline = Pipeline()

    # Makes sure the file is a TXT file (FileTypeClassifier node)
    classifier = FileTypeClassifier()
    indexing_pipeline.add_node(classifier, name="Classifier", inputs=["File"])

    # Converts a file into text and performs basic cleaning (TextConverter node)
    text_converter = TextConverter(remove_numeric_tables=True)
    indexing_pipeline.add_node(text_converter, name="Text_converter", inputs=["Classifier.output_1"])

    # - Pre-processes the text by performing splits and adding metadata to the text (Preprocessor node)
    preprocessor = PreProcessor(
        clean_whitespace=True,
        clean_empty_lines=True,
        split_length=150,
        split_overlap=10,
        split_respect_sentence_boundary=True,
    )
    indexing_pipeline.add_node(preprocessor, name="Preprocessor", inputs=["Text_converter"])

    # - Writes the resulting documents into the document store
    indexing_pipeline.add_node(document_store, name="Document_Store", inputs=["Preprocessor"])

    # Then we run it with the documents and their metadata as input

    # Initialize Retriever & Reader
    # retriever = BM25Retriever(document_store=document_store)
    retriever = EmbeddingRetriever(
        document_store=document_store, embedding_model="sentence-transformers/multi-qa-mpnet-base-dot-v1"
    )

    indexing_pipeline.run(file_paths=file_paths, meta=files_metadata)
    document_store.update_embeddings(retriever)

    # Now, we're defining the shared model for each PromptNode:
    prompt_open_ai = PromptModel(
        model_name_or_path="text-davinci-003", api_key="api KEY"
    )

    # Let's initialize the PromptNodes. Notice that they share the prompt model.
    # This initalizes the PromptNode that's going to generate the questions:

    # And this initializes the PromptNode that's going to answer the questions:
    qa_prompt_node = PromptNode(
        prompt_open_ai, default_prompt_template="question-answering"
    )  # "question-answering_citing"

    # Time to define the pipeline:
    pipeline = Pipeline()
    pipeline.add_node(component=retriever, name="retriever", inputs=["Query"])
    pipeline.add_node(component=qa_prompt_node, name="qa_prompt_node", inputs=["retriever"])
    output = pipeline.run(query="Who is the father of Arya Stark?", params={"retriever": {"top_k": 10}})
    print(output["results"])

    muh = 1


if __name__ == "__main__":
    launch_es(delete_existing=False)
    basic_qa_pipeline()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:LLM type:bug Something isn't working
Projects
None yet
3 participants