FARMReader parallelism issues #3289

danielbichuetti · 2022-09-27T23:29:10Z

Describe the bug
When running FARMReader, without a GPU, it spawns the inferencers, start processing, but suddenly all process goes to 0% CPU usage, and it never returns any results. I noticed that despise it spawning 7 inferencers, it's using just 1 CPU core.

This has been tested on a notebook, and on 3 Azure instances.

When using TransformersReader, the same setup (models, instances, document stores), the Reader spawns 7 inferencers, but 7 CPU cores get fully used. Results appears under 11m5s.

Error message
FARMReader get in a deadlock state

Expected behavior
Reader should use all CPU cores (it's not limited by any command), like TransformersReader, and should return results

Additional context

To Reproduce
Create a DocumentStore, create a BM25Retriever, crete a FARMReader, use ExtractiveQA pipeline to run the prediction query.

FAQ Check

Have you had a look at our new FAQ page?

System:

OS: Ubuntu
GPU/CPU: i7 and Xeon
Haystack version (commit or version number): 1.9.0
DocumentStore: OpenSearchDocumentStore
Reader: FARMReader
Retriever: BM25Retriever

sjrl · 2022-09-28T19:20:06Z

Hi @danielbichuetti thanks for opening the issue. This may be related to the multiprocessing in the FARMReader causing problems with the multiprocessing in the Inferencer. When initializing the FARMReader could you pass the option num_processes=0? So

reader = FARMReader(model_name_or_path="MODEL_NAME", num_processes=0)

and see if that prevents this thread locking from happening?

danielbichuetti · 2022-09-28T19:59:38Z

Hello @sjrl Thank you for your suggestion. It worked. It seems to be not a thread locking but a deadlock because of the multiprocessing.

After the suggested change, all cores were used, and no deadlocks.

sjrl · 2022-09-28T20:02:03Z

Hello @sjrl Thank you for your suggestion. It worked. It seems to be not a thread locking but a deadlock because of the multiprocessing.

@vblagoje @Timoeller This sounds like another good reason to remove multiprocessing from the FARMReader.

danielbichuetti · 2022-09-28T20:30:38Z

Just to keep as a note, after setting num_processes=0, these are the results for

FARMReader

CPU times: user 7min 22s, sys: 1min 5s, total: 8min 27s
Wall time: 1min 6s

TransformersReader

CPU times: user 11min 45s, sys: 9.28 s, total: 11min 54s
Wall time: 1min 30s

This is a good first sight that disabling multiprocessing in FARMReader, its performance is still superior to TransformersReader default setting.

sjrl · 2022-09-28T21:31:47Z

@Timoeller @vblagoje Tagging you guys to check out the above timings in regards to issue #3272

vblagoje · 2022-09-29T10:26:20Z

@danielbichuetti can you confirm that you used the same models in the readers? I ran tests similar to this one several times, and my measurements indicated TransformersReader is slightly faster (by ~ 15%). I didn't rely on colab but ran the tests directly on bare-metal to minimize discrepancies.

danielbichuetti · 2022-09-29T10:33:24Z

When I received the notification about the performance on the other topic, I started a new test using your notebook and same dataset. The only difference I could see at first is that I tested with GPU disabled, only CPU working. Tests are being done in an Xeon 8 CPU 56 GB RAM in Azure ML Studio.

As soon the tests finish (I'll run it 10 times, just to be certain), I'll post the update.

UPDATE: I just checked the model when FARMReader outperformed TransformersReader, it was deepset/xlm-roberta-large-squad2 in both readers. One major difference that I noticed is that in my tests the readers received a giant list of Document sent at once, in your notebook you are sending one Document per time. I'll try both ways. It will take a bit longer, but then we can have a better idea.

danielbichuetti · 2022-09-29T13:50:31Z

@vblagoje These are the results (median):

Hardware: 8vCPU Intel® Xeon® Platinum 8370C (Ice Lake) / 56 GB RAM / 400 GB SSD
Platform: Azure ML
Model: deepset/roberta-base-squad2

Individual document, multiple calls, CPU

FARMReader

CPU times: user 2h 52min 5s, sys: 3min 4s, total: 2h 55min 9s
Wall time: 27min 40s

TransformersReader

CPU times: user 1h 46min 6s, sys: 1min 18s, total: 1h 47min 24s
Wall time: 13min 53s

List of documents, one call, CPU

FARMReader

CPU times: user 1h 38min 35s, sys: 18min 41s, total: 1h 57min 16s
Wall time: 15min 32s

TransformersReader

CPU times: user 1h 41min 43s, sys: 1min 15s, total: 1h 42min 59s
Wall time: 13min 16s

However, I got interesting results when running an ExtractiveQAPipeline, using FARMReader (median Wall time: 1m 34s) vs. TransformersReader (median Wall time: 1m 32s). The tests were done using OpenSearchDocumentStore, a BM25Retriever and the Reader. I have run them 10 times, results were similar to my first tests in 4 cases, there were 2 deadlocks where FARMReader stopped using CPU, and 4 runs where TransformersReader was superior. This can be a result off:

network instability (DocumentStore is an OpenSearch cluster)
cloud instance instability

The deadlock is a rare situation when running FARMReader with num_processes=0, and extremely common when letting it set to default value.

These are the used pipelines:

retriever = BM25Retriever(document_store=doc_store, top_k=20)
reader = FARMReader(model_name_or_path="deepset/xlm-roberta-large-squad2", top_k=2, num_processes=0)
pipe = ExtractiveQAPipeline(reader, retriever)

retriever = BM25Retriever(document_store=doc_store, top_k=20)
reader = TransformersReader(model_name_or_path="deepset/xlm-roberta-large-squad2", top_k=2)
pipe = ExtractiveQAPipeline(reader, retriever)

You can contact me privately, so I can share credentials (it's an internal testing store, in two flavors, giant documents, small documents) to the DocumentStore.

sjrl added type:bug Something isn't working topic:pipeline journey:intermediate labels Sep 28, 2022

sjrl mentioned this issue Sep 28, 2022

refactor: remove Inferencer multiprocessing #3283

Merged

6 tasks

danielbichuetti changed the title ~~FARMReader thread locking~~ FARMReader parallelism issues Sep 28, 2022

sjrl mentioned this issue Sep 28, 2022

QA inferencer very slow because of bad default multiprocessing settings #3272

Closed

danielbichuetti closed this as completed Oct 4, 2022

sjrl mentioned this issue Oct 5, 2022

FARMReader slow #1077

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FARMReader parallelism issues #3289

FARMReader parallelism issues #3289

danielbichuetti commented Sep 27, 2022 •

edited

Loading

sjrl commented Sep 28, 2022

danielbichuetti commented Sep 28, 2022 •

edited

Loading

sjrl commented Sep 28, 2022 •

edited

Loading

danielbichuetti commented Sep 28, 2022 •

edited

Loading

sjrl commented Sep 28, 2022

vblagoje commented Sep 29, 2022

danielbichuetti commented Sep 29, 2022 •

edited

Loading

danielbichuetti commented Sep 29, 2022

FARMReader parallelism issues #3289

FARMReader parallelism issues #3289

Comments

danielbichuetti commented Sep 27, 2022 • edited Loading

sjrl commented Sep 28, 2022

danielbichuetti commented Sep 28, 2022 • edited Loading

sjrl commented Sep 28, 2022 • edited Loading

danielbichuetti commented Sep 28, 2022 • edited Loading

sjrl commented Sep 28, 2022

vblagoje commented Sep 29, 2022

danielbichuetti commented Sep 29, 2022 • edited Loading

danielbichuetti commented Sep 29, 2022

Individual document, multiple calls, CPU

List of documents, one call, CPU

danielbichuetti commented Sep 27, 2022 •

edited

Loading

danielbichuetti commented Sep 28, 2022 •

edited

Loading

sjrl commented Sep 28, 2022 •

edited

Loading

danielbichuetti commented Sep 28, 2022 •

edited

Loading

danielbichuetti commented Sep 29, 2022 •

edited

Loading