Skip to content
This repository has been archived by the owner on Oct 20, 2022. It is now read-only.

Commit

Permalink
Merge pull request #166 from deepset-ai/confidence-score-update-docs
Browse files Browse the repository at this point in the history
Replace "confidence" with "score" in answer
  • Loading branch information
julian-risch committed Sep 23, 2021
2 parents 1f87136 + 1430197 commit 3905847
Showing 1 changed file with 6 additions and 4 deletions.
10 changes: 6 additions & 4 deletions docs/latest/components/reader.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -211,9 +211,9 @@ you might like to try ALBERT XXL which has set SoTA performance on SQuAD 2.0.

When printing the full results of a Reader,
you will see that each prediction is accompanied
by a value in the range of 0 to 1 reflecting the model's confidence in that prediction
by a value in the range of 0 to 1 reflecting the model's confidence in that prediction.

In the output of `print_answers()`, you will find the model confidence in dictionary key called `confidence`.
In the output of `print_answers()`, you will find the model's confidence score in dictionary key called `score`.

```python
from haystack.utils import print_answers
Expand All @@ -229,14 +229,16 @@ print_answers(prediction, details="all")
'She travels with her father, Eddard, to '
"King's Landing when he is made Hand of the "
'King. Before she leaves,',
'confidence': 0.9899835586547852,
'score': 0.9899835586547852,
...
},
]
}
```

In order to align this probability score with the model's accuracy, finetuning needs to be performed
The intuition behind this score is the following: if a model has on average a confidence score of 0.9 that means we can expect the model's predictions to be correct in about 9 out of 10 cases.
However, if the model's training data strongly differs from the data it needs to make predictions on, we cannot guarantee that the confidence score and the model's accuracy are well aligned.
In order to better align this confidence score with the model's accuracy, finetuning needs to be performed
on a specific dataset.
To this end, the reader has a method `calibrate_confidence_scores(document_store, device, label_index, doc_index, label_origin)`.
The parameters of this method are the same as for the `eval()` method because the calibration of confidence scores is performed on a dataset that comes with gold labels.
Expand Down

1 comment on commit 3905847

@vercel
Copy link

@vercel vercel bot commented on 3905847 Sep 23, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please sign in to comment.