From 1430197aea2bf015ca14311113e3fa592cbbba46 Mon Sep 17 00:00:00 2001
From: Julian Risch <julian.risch@deepset.ai>
Date: Wed, 22 Sep 2021 16:52:29 +0200
Subject: [PATCH] Replace "confidence" with "score" in answer

---
 docs/latest/components/reader.mdx | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/docs/latest/components/reader.mdx b/docs/latest/components/reader.mdx
index 8c7829077..4cc5ef747 100644
--- a/docs/latest/components/reader.mdx
+++ b/docs/latest/components/reader.mdx
@@ -211,9 +211,9 @@ you might like to try ALBERT XXL which has set SoTA performance on SQuAD 2.0.
 
 When printing the full results of a Reader,
 you will see that each prediction is accompanied
-by a value in the range of 0 to 1 reflecting the model's confidence in that prediction
+by a value in the range of 0 to 1 reflecting the model's confidence in that prediction.
 
-In the output of `print_answers()`, you will find the model confidence in dictionary key called `confidence`.
+In the output of `print_answers()`, you will find the model's confidence score in dictionary key called `score`.
 
 ```python
 from haystack.utils import print_answers
@@ -229,14 +229,16 @@ print_answers(prediction, details="all")
                        'She travels with her father, Eddard, to '
                        "King's Landing when he is made Hand of the "
                        'King. Before she leaves,',
-            'confidence': 0.9899835586547852,
+            'score': 0.9899835586547852,
             ...
         },
     ]
 }
 ```
 
-In order to align this probability score with the model's accuracy, finetuning needs to be performed
+The intuition behind this score is the following: if a model has on average a confidence score of 0.9 that means we can expect the model's predictions to be correct in about 9 out of 10 cases.
+However, if the model's training data strongly differs from the data it needs to make predictions on, we cannot guarantee that the confidence score and the model's accuracy are well aligned.
+In order to better align this confidence score with the model's accuracy, finetuning needs to be performed
 on a specific dataset.
 To this end, the reader has a method `calibrate_confidence_scores(document_store, device, label_index, doc_index, label_origin)`.
 The parameters of this method are the same as for the `eval()` method because the calibration of confidence scores is performed on a dataset that comes with gold labels.