refactor: remove Inferencer multiprocessing #3283

vblagoje · 2022-09-27T07:57:37Z

Related Issues

fixes QA inferencer very slow because of bad default multiprocessing settings #3272

Proposed Changes:

Inferencer multiprocessing just as in #3087 started to cause lockups recently due to various inconsistencies around multiprocessing in torch. Some torch versions worked well, while others caused outright deadlocks.

As there are no performance benefits of multiprocessing, it is best to remove it altogether. Several examples were independently made on how multiprocessing slows down inferencing.

How did you test it?

CI tests, performance tests such as this colab notebook

Notes for the reviewer

TBD

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added tests that demonstrate the correct behavior of the change
I've used the conventional commit convention for my PR title
I documented my code
I ran pre-commit hooks and fixed any issue

sjrl · 2022-09-28T19:22:54Z

Hi, @vblagoje thanks for the work! An issue recently opened by @danielbichuetti may also be related to this, which can be found here Issue #3289. ~~It's possible that the multiprocessing in the FARMReader is preventing the multiprocessing in the Inferencer from working as expected.~~ Edit: Sorry I misunderstood, the multiprocessing in the FARMReader and the Inferencer are the same.

~~Would it be possible to do the same CI performance check using the TransformerReader instead and see if this time difference is still observed?~~ Edit: Actually, this isn't really relevant since the TransformerReader does not use the same Inferencer as the FARMReader so it will not have this multiprocessing performance problem.

vblagoje · 2022-09-28T19:55:05Z

Yeah totally-have a look at the notebook I made to measure FarmReader performance with/without multiprocessing. It should be relatively simple to create TransformerReader and inference it instead of FarmReader. I'll do it tomorrow. We should also do measurements on CPU as well for both versions.

sjrl · 2022-09-28T19:58:47Z

@vblagoje Yeah sounds good! Also, I'm in the US right now hence why I'm still online so please feel free to respond tomorrow at a more reasonable time in Berlin!

sjrl · 2022-09-28T23:46:45Z

@vblagoje See comment #3272 (comment) on the original issue opened by @Timoeller.

sjrl · 2022-09-30T16:55:09Z

@vblagoje I think it is the correct move to remove multiprocessing from the inferencer since it seems to conflict with multiprocessing from the FastTokenizers in HF.

My only suggestion would be to go through infer.py and add deprecation warnings to all places where the variable multiprocessing_chunksize appears since it no longer used. For example in functions like inference_from_file, inference_from_objects, etc. in both the class Inferencer and the class QAInferencer.

vblagoje · 2022-10-04T09:33:55Z

@sjrl @masci let me know if we need to do anything else for this one

sjrl · 2022-10-04T10:33:25Z

haystack/modeling/infer.py

- except AttributeError:
- pass
- yield from predictions
-
 @classmethod
 def _create_datasets_chunkwise(cls, chunk, processor: Processor):


@vblagoje Should the method _create_datasets_chunkwise also be removed? It was only used in _inference_with_multiprocessing.

+1 spot on, removed @sjrl

sjrl

LGTM!

vblagoje requested a review from a team as a code owner September 27, 2022 07:57

vblagoje requested review from masci and removed request for a team September 27, 2022 07:57

vblagoje mentioned this pull request Sep 27, 2022

FARMReader slow #1077

Closed

masci changed the title ~~Remove Inferencer multiprocessing~~ refactor: remove Inferencer multiprocessing Sep 29, 2022

vblagoje requested a review from sjrl September 29, 2022 17:10

vblagoje added 3 commits October 4, 2022 11:32

Remove Inferencer multiprocessing

cbbc3c4

Remove Inferencer multiprocessing: update tests

1361b54

Add more deprecated notes in pydoc

907bce7

sjrl reviewed Oct 4, 2022

View reviewed changes

Remove _create_datasets_chunkwise

f0850b7

sjrl approved these changes Oct 4, 2022

View reviewed changes

vblagoje merged commit 6cb4e93 into deepset-ai:main Oct 4, 2022

vblagoje deleted the fix_qa_mp_inferencing branch October 24, 2022 08:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: remove Inferencer multiprocessing #3283

refactor: remove Inferencer multiprocessing #3283

vblagoje commented Sep 27, 2022

sjrl commented Sep 28, 2022 •

edited

Loading

vblagoje commented Sep 28, 2022

sjrl commented Sep 28, 2022

sjrl commented Sep 28, 2022

sjrl commented Sep 30, 2022

vblagoje commented Oct 4, 2022

sjrl Oct 4, 2022

vblagoje Oct 4, 2022

sjrl left a comment

refactor: remove Inferencer multiprocessing #3283

refactor: remove Inferencer multiprocessing #3283

Conversation

vblagoje commented Sep 27, 2022

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

sjrl commented Sep 28, 2022 • edited Loading

vblagoje commented Sep 28, 2022

sjrl commented Sep 28, 2022

sjrl commented Sep 28, 2022

sjrl commented Sep 30, 2022

vblagoje commented Oct 4, 2022

sjrl Oct 4, 2022

Choose a reason for hiding this comment

vblagoje Oct 4, 2022

Choose a reason for hiding this comment

sjrl left a comment

Choose a reason for hiding this comment

sjrl commented Sep 28, 2022 •

edited

Loading