Difference in answers predictions using the same model in FarmReader and TransformersReader #248

antoniolanza1996 · 2020-07-18T21:14:20Z

Question
Why does the behaviour change if I use a model (e.g. deepset/roberta-base-squad2) wrapped in TransformersReader instead of FARMReader?

Additional context
If I run this code:

model_name="deepset/roberta-base-squad2"
reader1 = FARMReader(model_name_or_path= model_name, use_gpu=True)
reader2 = TransformersReader(model=model_name, tokenizer=model_name, use_gpu=-1)

I expected that reader1 and reader2 would give the same predictions but it didn't happen. Is there a difference between FARMReader and TransformersReader that I am not taking into account?

The text was updated successfully, but these errors were encountered:

brandenchan · 2020-07-20T11:57:48Z

Hey thanks for reporting the issue! So it is possible that FARMReader and TransformerReader give different predictions. But not to the degree that we saw in our attempt to replicate your issue. We are working on it and will let you know when we find out more

antoniolanza1996 · 2020-07-20T12:41:54Z

Hey, thank you for the support. I have also committed my notebook if it can help to replicate my issue. You can find it here: https://github.com/antoniolanza1996/Haystack_TMP/blob/master/Haystack.ipynb

brandenchan · 2020-07-21T16:49:28Z

Hey @antoniolanza1996 so we dug deeper and actually found that there quite a few differences between FARM and transformer Readers that would contribute to this kind of divergence. Bottom line is that we think both Readers are currently working as they are designed, albeit differently from each other.

What I suspect causes the biggest difference between the two models are the following two points:

In Transformers, start and end logits are normalized with softmax in each passage (https://github.com/huggingface/transformers/blob/ccbf74a685ae24bd1a0ba1325e4e9a9d62bbb2fa/src/transformers/pipelines.py#L1357). This makes comparisons between answers found in different parts of the text less effective. FARM does not apply the softmax thus making logits more comparable.
In HuggingFace, start and end logits are multiplied (https://github.com/huggingface/transformers/blob/ccbf74a685ae24bd1a0ba1325e4e9a9d62bbb2fa/src/transformers/pipelines.py#L1415) while in FARM they are added (https://github.com/deepset-ai/FARM/blob/99c2694587e6a573012d1fa7c2b8d7eca8a888ab/farm/modeling/prediction_head.py#L1072). We suspect that this results in Transformers returning a greater diversity of answers while FARM is more primed to return overlapping spans as answers.

There are also some more minor differences:

For one, the tokenization using Roberta models is slightly different. In practice this should not make a substatial difference but it has to do with how the Roberta tokenizer treats words that have a whitespace before them and words that don't. If you're interested in understanding this more, check out this issue(RoBERTa/GPT2 tokenization huggingface/transformers#1196), this part of the FARM code (https://github.com/deepset-ai/FARM/blob/99c2694587e6a573012d1fa7c2b8d7eca8a888ab/farm/modeling/tokenization.py#L297) and contrast how the Transformers pipeline calls the tokenizer.
We've seen also that in the Transformers reader, the model can predict the exact same answer twice with two different scores if it occurs in a section of text that is in two "sliding windows". This has to do with how documents are split into smaller passages that can be passed into the model. In FARM, we remove such duplicates.

Not sure what level of familiarity you have with QA systems but if you want a primer on sliding windows, logit calculations and aggreation, I'd point you to this! (https://medium.com/deepset-ai/modern-question-answering-systems-explained-4d0913744097)

I hope this helps and if you'd like me to clarify anything that I mentioned above, just let me know. I'd be more than happy to try and explain.

antoniolanza1996 · 2020-07-21T21:03:10Z

@brandenchan, now it makes sense. Thank you for the support. I am going to close the issue.

su2twtbridge · 2020-12-24T18:34:47Z

On performance test found FARMReader is better (in speed) compared to TransformerReader.
At the same time the memory consumption of FARM is higher than Transformer.
Can you someone please put somelight on above two observations?

antoniolanza1996 · 2020-12-24T18:45:35Z

Hey @su2twtbridge,
in this closed issue probably no one will see your comment. Please open a new issue and surely Haystack's team will reply you.

antoniolanza1996 added the question label Jul 18, 2020

tholor assigned brandenchan Jul 20, 2020

antoniolanza1996 closed this as completed Jul 21, 2020

tholor mentioned this issue Sep 23, 2020

Different answers from Transformers vs FARMReader for deepset/roberta-base-squad2 #425

Closed

tholor mentioned this issue Oct 10, 2020

deepset/roberta-base-squad2 seems to work not as it should #476

Closed

tholor mentioned this issue Nov 6, 2020

score logic #563

Closed

Krak91 mentioned this issue Jan 27, 2022

QA inference - question about Farm implementation #2079

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Difference in answers predictions using the same model in FarmReader and TransformersReader #248

Difference in answers predictions using the same model in FarmReader and TransformersReader #248

antoniolanza1996 commented Jul 18, 2020 •

edited

Loading

brandenchan commented Jul 20, 2020 •

edited

Loading

antoniolanza1996 commented Jul 20, 2020

brandenchan commented Jul 21, 2020

antoniolanza1996 commented Jul 21, 2020

su2twtbridge commented Dec 24, 2020

antoniolanza1996 commented Dec 24, 2020

Difference in answers predictions using the same model in FarmReader and TransformersReader #248

Difference in answers predictions using the same model in FarmReader and TransformersReader #248

Comments

antoniolanza1996 commented Jul 18, 2020 • edited Loading

brandenchan commented Jul 20, 2020 • edited Loading

antoniolanza1996 commented Jul 20, 2020

brandenchan commented Jul 21, 2020

antoniolanza1996 commented Jul 21, 2020

su2twtbridge commented Dec 24, 2020

antoniolanza1996 commented Dec 24, 2020

antoniolanza1996 commented Jul 18, 2020 •

edited

Loading

brandenchan commented Jul 20, 2020 •

edited

Loading