QA inferencer very slow because of bad default multiprocessing settings #3272

Timoeller · 2022-09-23T15:52:34Z

Describe the bug
When doing pipeline.eval I realized that it is very slow and also outputs way too many lines of tqdm.
That is why I tested it with and without multiprocessing. My results are incredible:

MP on
Elapsed: 22.35

MP off
Elapsed: 6.079

To Reproduce
I benchmarked the pipeline.eval function on a V100 with only 10 labels + retriever topk 10
with FARMReaders num_processes parameter set to 0 (disable MP) or None (use all available processes)

additional insights
I think the problematic part is in the FARMReader.predict methods that calls:

        predictions = self.inferencer.inference_from_objects(
            objects=inputs, return_json=False, multiprocessing_chunksize=1
        )

Since the multiprocessing_chunksize is hard coded to 1 it will split up all data, even when it is very few data, and feed this one by one to the GPU. This results in tiny batches and therefore bad GPU utilization for basically all retriever + reader usage...

The text was updated successfully, but these errors were encountered:

vblagoje · 2022-09-27T07:22:04Z

Confirming performance speedup when using FARMReader predict API and MP turned off, see colab notebook

sjrl · 2022-09-28T23:45:00Z

Considering the findings of issue #3289 my guess is that the multiprocessing present in the Inferencer does not work properly when using the parallelism of the FastTokenizers from HuggingFace, which causes the substantial slow down. I haven't had a chance to test this specifically, but this would explain why @danielbichuetti still sees multiple CPUs being utilized when using the TransformerReader and also when num_processes=0 is passed to the FARMReader (which turns off parallelism in the Inferencer).

vblagoje · 2022-09-29T10:14:07Z

Updated notebook to include TransformersReader performance. TransformersReader is slightly faster than FarmReader (no multiprocessing). The performance difference in the colab seems a bit more than it actually is. I ran these simple performance tests several times in a more controlled environment, and TransformersReader is about 15% faster than FARMReader.

vblagoje self-assigned this Sep 27, 2022

vblagoje added the topic:reader label Sep 27, 2022

This was referenced Sep 27, 2022

Replace multiprocessing tokenization with batched fast tokenization #3087

Closed

refactor: remove Inferencer multiprocessing #3283

Merged

sjrl added type:bug Something isn't working journey:advanced labels Sep 28, 2022

sjrl mentioned this issue Sep 28, 2022

FARMReader parallelism issues #3289

Closed

1 task

vblagoje closed this as completed in #3283 Oct 4, 2022

sjrl mentioned this issue Oct 5, 2022

FARMReader slow #1077

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

QA inferencer very slow because of bad default multiprocessing settings #3272

QA inferencer very slow because of bad default multiprocessing settings #3272

Timoeller commented Sep 23, 2022

vblagoje commented Sep 27, 2022

sjrl commented Sep 28, 2022

vblagoje commented Sep 29, 2022

QA inferencer very slow because of bad default multiprocessing settings #3272

QA inferencer very slow because of bad default multiprocessing settings #3272

Comments

Timoeller commented Sep 23, 2022

vblagoje commented Sep 27, 2022

sjrl commented Sep 28, 2022

vblagoje commented Sep 29, 2022