You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We should change the inputs of the AnswerExactMatchEvaluator added in #7050 from questions: List[str], ground_truth_answers: List[List[str]], predicted_answers: List[List[str]]
to ground_truth_answers: List[str], predicted_answers: List[str]
This change will make all metrics in Haystack core and in integrations, statistical and model-based, consistent in the inputs they expect. For answers always List[str], for queries always List[str] and for documents (contexts) always List[List[str]]. It also simplifies the implementation of the new metrics and will allow us to move faster.
Describe alternatives you've considered
Keeping inputs as is would mean they are inconsistent with the model based metrics and the integrations of evaluation frameworks. However, the behavior would be the same as the exact match metric in Haystack 1.x and more flexible for datasets with multiple ground truth answers such as SQuAD 2.0 and multiple predicted answers like our Reader's output.
The text was updated successfully, but these errors were encountered:
We should change the inputs of the
AnswerExactMatchEvaluator
added in #7050 fromquestions: List[str], ground_truth_answers: List[List[str]], predicted_answers: List[List[str]]
to
ground_truth_answers: List[str], predicted_answers: List[str]
This change will make all metrics in Haystack core and in integrations, statistical and model-based, consistent in the inputs they expect. For answers always
List[str]
, for queries alwaysList[str]
and for documents (contexts) alwaysList[List[str]]
. It also simplifies the implementation of the new metrics and will allow us to move faster.Describe alternatives you've considered
Keeping inputs as is would mean they are inconsistent with the model based metrics and the integrations of evaluation frameworks. However, the behavior would be the same as the exact match metric in Haystack 1.x and more flexible for datasets with multiple ground truth answers such as SQuAD 2.0 and multiple predicted answers like our Reader's output.
The text was updated successfully, but these errors were encountered: