Reader returns the same answer twice in isolated eval mode (if there are two labels from the same document) #4092

tstadel · 2023-02-07T20:13:11Z

Describe the bug
Starting with 1.13.0, BaseReader.run does not deduplicate documents in isolated node eval.

Previously we used

relevant_documents = {label.document.id: label.document for label in labels.labels}.values()

which deduplicates documents in case there are labels for the same documents (but different span).

Now we don't do this anymore:

haystack/haystack/nodes/reader/base.py

Lines 120 to 122 in a2c160e

 relevant_documents = [label.document for label in labels.labels] 

 # Filter out empty documents 

 relevant_documents = [d for d in relevant_documents if d.content.strip() != ""]

This results in duplicate predictions as the Reader treats the same documents as different ones.

Error message
None, but duplicate predictions.

Expected behavior
No duplicate predictions.

Additional context

To Reproduce

Create two labels from the same document but with different answer spans.
Run pipeline.eval on a single-noded Reader pipeline in isolated mode with add_isolated_node_eval=True.
Get the same predictions twice.

FAQ Check

Have you had a look at our new FAQ page?

System:

OS: any
GPU/CPU: any
Haystack version (commit or version number): 1.13.1
DocumentStore: any
Reader: FARMReader (but should be the same for TransformersReader)
Retriever: any

The text was updated successfully, but these errors were encountered:

sjrl added type:bug Something isn't working topic:eval topic:reader labels Feb 8, 2023

julian-risch assigned bogdankostic Feb 8, 2023

julian-risch added the P1 High priority, add to the next sprint label Feb 8, 2023

bogdankostic mentioned this issue Feb 9, 2023

fix: Deduplicate same Documents in isolated evaluation of Reader #4114

Merged

6 tasks

bogdankostic closed this as completed in #4114 Feb 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reader returns the same answer twice in isolated eval mode (if there are two labels from the same document) #4092

Reader returns the same answer twice in isolated eval mode (if there are two labels from the same document) #4092

tstadel commented Feb 7, 2023

Reader returns the same answer twice in isolated eval mode (if there are two labels from the same document) #4092

Reader returns the same answer twice in isolated eval mode (if there are two labels from the same document) #4092

Comments

tstadel commented Feb 7, 2023

Additional context