Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

documents get overridden by Shaper #4230

Closed
1 task done
tstadel opened this issue Feb 22, 2023 · 0 comments · Fixed by #4231
Closed
1 task done

documents get overridden by Shaper #4230

tstadel opened this issue Feb 22, 2023 · 0 comments · Fixed by #4231
Assignees

Comments

@tstadel
Copy link
Member

tstadel commented Feb 22, 2023

Describe the bug
When having a join_documents shaper inside the pipeline to join them for a QA PromptNode, we cannot easily access the original documents that were used as promptnode input anymore, as they are being overridden by the shaper. E.g. if we want to connect the Answer to the input documents, we cannot do this.

This is a pipeline which has the problem:

version: '1.13.2'
components:
  - name: DocumentStore
    type: InMemoryDocumentStore
  - name: Retriever
    type: BM25Retriever
    params:
      document_store: DocumentStore
      top_k: 3
  - name: PromptModel
    type: PromptModel
    params:
      model_name_or_path: google/flan-t5-large
      model_kwargs:
        model_kwargs:
          num_return_sequences: 3
          num_beams: 3
  - name: PromptNode 
    type: PromptNode
    params:
      default_prompt_template: question-answering
      model_name_or_path: PromptModel
  - name: InputDocumentShaper
    type: Shaper
    params:
      func: join_documents
      inputs:
        documents: documents
      outputs:
        - documents
      params:
        delimiter: " - "
  - name: InputQuestionsShaper
    type: Shaper
    params:
      func: value_to_list
      inputs:
        value: query
      outputs:
        - questions
      params:
        target_list: [1]
  - name: OutputAnswerShaper
    type: Shaper
    params:
      func: strings_to_answers
      inputs:
        strings: results
      outputs:
        - answers
pipelines:
  - name: query
    nodes:
      - name: Retriever
        inputs: [Query]
      - name: InputDocumentShaper
        inputs: [Retriever]
      - name: InputQuestionsShaper
        inputs: [InputDocumentShaper]
      - name: PromptNode
        inputs: [InputQuestionsShaper]
      - name: OutputAnswerShaper
        inputs: [PromptNode]

will spit out only the joined documents under documents having ids that do not relate to any document_id that is present in the document store.

And this is a pipeline which works around it by storing the original documents somewhere else:

version: '1.13.2'
components:
  - name: DocumentStore
    type: InMemoryDocumentStore
  - name: Retriever
    type: BM25Retriever
    params:
      document_store: DocumentStore
      top_k: 3
  - name: PromptModel
    type: PromptModel
    params:
      model_name_or_path: google/flan-t5-large
      model_kwargs:
        model_kwargs:
          num_return_sequences: 3
          num_beams: 3
  - name: PromptNode
    type: PromptNode
    params:
      default_prompt_template: question-answering
      model_name_or_path: PromptModel
  - name: InputDocumentShaper_1
    type: Shaper
    params:
      func: rename
      inputs:
        value: documents
      outputs:
        - documents_orig
  - name: InputDocumentShaper_2
    type: Shaper
    params:
      func: join_documents
      inputs:
        documents: documents
      outputs:
        - documents
      params:
        delimiter: " - "
  - name: InputQuestionsShaper
    type: Shaper
    params:
      func: value_to_list
      inputs:
        value: query
      outputs:
        - questions
      params:
        target_list: [1]
  - name: OutputAnswerShaper
    type: Shaper
    params:
      func: strings_to_answers
      inputs:
        strings: results
      outputs:
        - answers
  - name: OutputDocumentShaper
    type: Shaper
    params:
      func: rename
      inputs:
        value: documents_orig
      outputs:
        - documents
pipelines:
  - name: query
    nodes:
      - name: Retriever
        inputs: [Query]
      - name: InputDocumentShaper_1
        inputs: [Retriever]
      - name: InputDocumentShaper_2
        inputs: [InputDocumentShaper_1]
      - name: InputQuestionsShaper
        inputs: [InputDocumentShaper_2]
      - name: PromptNode
        inputs: [InputQuestionsShaper]
      - name: OutputAnswerShaper
        inputs: [PromptNode]
      - name: OutputDocumentShaper
        inputs: [OutputAnswerShaper]

will spit out only the original documents under documents having ids that are present in the document store.

Expected behavior
There is a way to control whether Shaper overrides the outputs.

FAQ Check

System:

  • Haystack version (commit or version number): 1.13.2
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants