Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add switch to QA pred head for ranking by confidence scores #836

Merged
merged 3 commits into from
Aug 18, 2021

Conversation

julian-risch
Copy link
Member

@julian-risch julian-risch commented Aug 17, 2021

QACandidates contain score and confidence fields but only score was used to rank QACandidates so far.
This PR makes ranking QACandidates by score the default but also allows to set use_confidence_scores_for_ranking in QAPredictionHead, which activates ranking
QACandidates by confidence.

I think, it's better to keep both fields, score and confidence in QACandidates instead of dropping one of them entirely. Thereby, we are more flexible and allow backwards compatibility.

A new test case checks that results are ranked as expected with and without setting use_confidence_scores_for_ranking.

closes #808

@julian-risch julian-risch changed the title WIP: Add switch to QA pred head for ranking by confidence scores Add switch to QA pred head for ranking by confidence scores Aug 18, 2021
@julian-risch julian-risch marked this pull request as ready for review August 18, 2021 12:56
Copy link
Member

@tholor tholor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Small thing: Not directly related to this PR but I think it's a great opportunity to add a bit more explanation around the different scores we have (see comment)

@@ -963,6 +964,8 @@ def __init__(self, layer_dims=[768,2],
:type duplicate_filtering: int
:param temperature_for_confidence: The divisor that is used to scale logits to calibrate confidence scores
:type temperature_for_confidence: float
:param use_confidence_scores_for_ranking: Whether to sort answers by confidence score (normalized between 0 and 1) or by standard score (unbounded)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we document here somewhere a bit clearer what these different scores exactly are and how they are calculated? If I remember correctly we have these three cases:
a) score = start + end logit (unbounded)
b) confidence (default) = logits scaled to 0-1 and incorporating no_answer
c) confidence (calibrated) - same a b) but we multiply it with a learned scaling parameter
I am sure I will forget about it in a couple of weeks and would be helpful to have it documented for others. Probably it's best to do that in the general prediction head doc string (+ Haystack's FARMReader).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. I added an explanation of the three kinds of scores to the doc string of the QAPredictionHead.

@julian-risch julian-risch merged commit 2baa409 into master Aug 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

allow switching between confidence scores in reader
2 participants