Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reader returns the same answer twice in isolated eval mode (if there are two labels from the same document) #4092

Closed
1 task done
tstadel opened this issue Feb 7, 2023 · 0 comments · Fixed by #4114
Closed
1 task done
Assignees
Labels
P1 High priority, add to the next sprint topic:eval topic:reader type:bug Something isn't working

Comments

@tstadel
Copy link
Member

tstadel commented Feb 7, 2023

Describe the bug
Starting with 1.13.0, BaseReader.run does not deduplicate documents in isolated node eval.

Previously we used

relevant_documents = {label.document.id: label.document for label in labels.labels}.values()

which deduplicates documents in case there are labels for the same documents (but different span).

Now we don't do this anymore:

relevant_documents = [label.document for label in labels.labels]
# Filter out empty documents
relevant_documents = [d for d in relevant_documents if d.content.strip() != ""]

This results in duplicate predictions as the Reader treats the same documents as different ones.

Error message
None, but duplicate predictions.

Expected behavior
No duplicate predictions.

Additional context

To Reproduce

  1. Create two labels from the same document but with different answer spans.
  2. Run pipeline.eval on a single-noded Reader pipeline in isolated mode with add_isolated_node_eval=True.
  3. Get the same predictions twice.

FAQ Check

System:

  • OS: any
  • GPU/CPU: any
  • Haystack version (commit or version number): 1.13.1
  • DocumentStore: any
  • Reader: FARMReader (but should be the same for TransformersReader)
  • Retriever: any
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P1 High priority, add to the next sprint topic:eval topic:reader type:bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants