Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QA inferencer very slow because of bad default multiprocessing settings #3272

Closed
Timoeller opened this issue Sep 23, 2022 · 3 comments · Fixed by #3283
Closed

QA inferencer very slow because of bad default multiprocessing settings #3272

Timoeller opened this issue Sep 23, 2022 · 3 comments · Fixed by #3283
Assignees
Labels
topic:reader type:bug Something isn't working

Comments

@Timoeller
Copy link
Contributor

Describe the bug
When doing pipeline.eval I realized that it is very slow and also outputs way too many lines of tqdm.
That is why I tested it with and without multiprocessing. My results are incredible:

MP on
Elapsed: 22.35

MP off
Elapsed: 6.079

To Reproduce
I benchmarked the pipeline.eval function on a V100 with only 10 labels + retriever topk 10
with FARMReaders num_processes parameter set to 0 (disable MP) or None (use all available processes)

additional insights
I think the problematic part is in the FARMReader.predict methods that calls:

        predictions = self.inferencer.inference_from_objects(
            objects=inputs, return_json=False, multiprocessing_chunksize=1
        )

Since the multiprocessing_chunksize is hard coded to 1 it will split up all data, even when it is very few data, and feed this one by one to the GPU. This results in tiny batches and therefore bad GPU utilization for basically all retriever + reader usage...

@vblagoje vblagoje self-assigned this Sep 27, 2022
@vblagoje
Copy link
Member

Confirming performance speedup when using FARMReader predict API and MP turned off, see colab notebook

@sjrl
Copy link
Contributor

sjrl commented Sep 28, 2022

Considering the findings of issue #3289 my guess is that the multiprocessing present in the Inferencer does not work properly when using the parallelism of the FastTokenizers from HuggingFace, which causes the substantial slow down. I haven't had a chance to test this specifically, but this would explain why @danielbichuetti still sees multiple CPUs being utilized when using the TransformerReader and also when num_processes=0 is passed to the FARMReader (which turns off parallelism in the Inferencer).

@vblagoje
Copy link
Member

Updated notebook to include TransformersReader performance. TransformersReader is slightly faster than FarmReader (no multiprocessing). The performance difference in the colab seems a bit more than it actually is. I ran these simple performance tests several times in a more controlled environment, and TransformersReader is about 15% faster than FARMReader.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:reader type:bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants