Recall with DocumentRecallEvaluator is not calculated correctly #7867

wirtsi · 2024-06-14T13:22:58Z

Describe the bug
I might have gotten this wrong, but I feel the recall is not calculated correctly

Given this code

from haystack import Document
from haystack.components.evaluators import DocumentRecallEvaluator

evaluator = DocumentRecallEvaluator()
result = evaluator.run(
    ground_truth_documents=[
        [Document(content="9th century"), Document(content="9th")],
    ],
    retrieved_documents=[
        [Document(content="9th century"), Document(content="10th century"), Document(content="9th")],
    ],
)
print(result)

I get
{'score': 1.0, 'individual_scores': [1.0]}

If I now change the first Document in retrieved_documents to be for example

Document(content="1st century") the result is still
{'score': 1.0, 'individual_scores': [1.0]}

Expected behavior
As only one of the documents in the two ground_truths docs is found, I would expect that the score is 0.5, not 1.0

Additional context
Add any other context about the problem here, like document types / preprocessing steps / settings of reader etc.

To Reproduce
Steps to reproduce the behavior

FAQ Check

Have you had a look at our new FAQ page?

System:

OS: MacOs
GPU/CPU: M1
Haystack version (commit or version number): 2.2.1
DocumentStore:
Reader:
Retriever:

The text was updated successfully, but these errors were encountered:

anakin87 · 2024-06-14T13:59:52Z

Hello!

From the docs:

When initializing a DocumentRecallEvaluator, you can set the mode parameter to
RecallMode.SINGLE_HIT or RecallMode.MULTI_HIT. By default, RecallMode.SINGLE_HIT is used.

RecallMode.SINGLE_HIT means that any of the ground truth documents need to be retrieved to count as a correct retrieval with a recall score of 1. A single retrieved document can achieve the full score.

RecallMode.MULTI_HIT means that all of the ground truth documents need to be retrieved to count as a correct retrieval with a recall score of 1. The number of retrieved documents must be at least the number of ground truth documents to achieve the full score.

So, in this case (single hit), the result of 1.0 is correct.
If you set the mode to multi hit, you get a result of 0.5.

I am closing the issue. Feel free to reopen it if anything is unclear...

wirtsi · 2024-06-14T14:25:10Z

Damn, you are right 😮‍💨 Thanks

anakin87 closed this as completed Jun 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Recall with DocumentRecallEvaluator is not calculated correctly #7867

Recall with DocumentRecallEvaluator is not calculated correctly #7867

wirtsi commented Jun 14, 2024 •

edited

Loading

anakin87 commented Jun 14, 2024

wirtsi commented Jun 14, 2024

Recall with DocumentRecallEvaluator is not calculated correctly #7867

Recall with DocumentRecallEvaluator is not calculated correctly #7867

Comments

wirtsi commented Jun 14, 2024 • edited Loading

anakin87 commented Jun 14, 2024

wirtsi commented Jun 14, 2024

wirtsi commented Jun 14, 2024 •

edited

Loading