Skip to content
This repository has been archived by the owner on Oct 20, 2022. It is now read-only.

Replace "confidence" with "score" in answer #166

Merged
merged 1 commit into from
Sep 23, 2021
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Replace "confidence" with "score" in answer
  • Loading branch information
julian-risch committed Sep 22, 2021
commit 1430197aea2bf015ca14311113e3fa592cbbba46
10 changes: 6 additions & 4 deletions docs/latest/components/reader.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -211,9 +211,9 @@ you might like to try ALBERT XXL which has set SoTA performance on SQuAD 2.0.

When printing the full results of a Reader,
you will see that each prediction is accompanied
by a value in the range of 0 to 1 reflecting the model's confidence in that prediction
by a value in the range of 0 to 1 reflecting the model's confidence in that prediction.

In the output of `print_answers()`, you will find the model confidence in dictionary key called `confidence`.
In the output of `print_answers()`, you will find the model's confidence score in dictionary key called `score`.

```python
from haystack.utils import print_answers
Expand All @@ -229,14 +229,16 @@ print_answers(prediction, details="all")
'She travels with her father, Eddard, to '
"King's Landing when he is made Hand of the "
'King. Before she leaves,',
'confidence': 0.9899835586547852,
'score': 0.9899835586547852,
...
},
]
}
```

In order to align this probability score with the model's accuracy, finetuning needs to be performed
The intuition behind this score is the following: if a model has on average a confidence score of 0.9 that means we can expect the model's predictions to be correct in about 9 out of 10 cases.
However, if the model's training data strongly differs from the data it needs to make predictions on, we cannot guarantee that the confidence score and the model's accuracy are well aligned.
In order to better align this confidence score with the model's accuracy, finetuning needs to be performed
on a specific dataset.
To this end, the reader has a method `calibrate_confidence_scores(document_store, device, label_index, doc_index, label_origin)`.
The parameters of this method are the same as for the `eval()` method because the calibration of confidence scores is performed on a dataset that comes with gold labels.
Expand Down