Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recall with DocumentRecallEvaluator is not calculated correctly #7867

Closed
1 task done
wirtsi opened this issue Jun 14, 2024 · 2 comments
Closed
1 task done

Recall with DocumentRecallEvaluator is not calculated correctly #7867

wirtsi opened this issue Jun 14, 2024 · 2 comments

Comments

@wirtsi
Copy link

wirtsi commented Jun 14, 2024

Describe the bug
I might have gotten this wrong, but I feel the recall is not calculated correctly

Given this code

from haystack import Document
from haystack.components.evaluators import DocumentRecallEvaluator

evaluator = DocumentRecallEvaluator()
result = evaluator.run(
    ground_truth_documents=[
        [Document(content="9th century"), Document(content="9th")],
    ],
    retrieved_documents=[
        [Document(content="9th century"), Document(content="10th century"), Document(content="9th")],
    ],
)
print(result)

I get
{'score': 1.0, 'individual_scores': [1.0]}

If I now change the first Document in retrieved_documents to be for example

Document(content="1st century") the result is still
{'score': 1.0, 'individual_scores': [1.0]}

Expected behavior
As only one of the documents in the two ground_truths docs is found, I would expect that the score is 0.5, not 1.0

Additional context
Add any other context about the problem here, like document types / preprocessing steps / settings of reader etc.

To Reproduce
Steps to reproduce the behavior

FAQ Check

System:

  • OS: MacOs
  • GPU/CPU: M1
  • Haystack version (commit or version number): 2.2.1
  • DocumentStore:
  • Reader:
  • Retriever:
@anakin87
Copy link
Member

Hello!

From the docs:

When initializing a DocumentRecallEvaluator, you can set the mode parameter to
RecallMode.SINGLE_HIT or RecallMode.MULTI_HIT. By default, RecallMode.SINGLE_HIT is used.

RecallMode.SINGLE_HIT means that any of the ground truth documents need to be retrieved to count as a correct retrieval with a recall score of 1. A single retrieved document can achieve the full score.

RecallMode.MULTI_HIT means that all of the ground truth documents need to be retrieved to count as a correct retrieval with a recall score of 1. The number of retrieved documents must be at least the number of ground truth documents to achieve the full score.

So, in this case (single hit), the result of 1.0 is correct.
If you set the mode to multi hit, you get a result of 0.5.


I am closing the issue. Feel free to reopen it if anything is unclear...

@wirtsi
Copy link
Author

wirtsi commented Jun 14, 2024

Damn, you are right 😮‍💨 Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants