Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FARMReader parallelism issues #3289

Closed
1 task done
danielbichuetti opened this issue Sep 27, 2022 · 8 comments
Closed
1 task done

FARMReader parallelism issues #3289

danielbichuetti opened this issue Sep 27, 2022 · 8 comments
Labels
topic:pipeline type:bug Something isn't working

Comments

@danielbichuetti
Copy link
Contributor

danielbichuetti commented Sep 27, 2022

Describe the bug
When running FARMReader, without a GPU, it spawns the inferencers, start processing, but suddenly all process goes to 0% CPU usage, and it never returns any results. I noticed that despise it spawning 7 inferencers, it's using just 1 CPU core.

This has been tested on a notebook, and on 3 Azure instances.

When using TransformersReader, the same setup (models, instances, document stores), the Reader spawns 7 inferencers, but 7 CPU cores get fully used. Results appears under 11m5s.

Error message
FARMReader get in a deadlock state

Expected behavior
Reader should use all CPU cores (it's not limited by any command), like TransformersReader, and should return results

Additional context

To Reproduce
Create a DocumentStore, create a BM25Retriever, crete a FARMReader, use ExtractiveQA pipeline to run the prediction query.

FAQ Check

System:

  • OS: Ubuntu
  • GPU/CPU: i7 and Xeon
  • Haystack version (commit or version number): 1.9.0
  • DocumentStore: OpenSearchDocumentStore
  • Reader: FARMReader
  • Retriever: BM25Retriever
@sjrl
Copy link
Contributor

sjrl commented Sep 28, 2022

Hi @danielbichuetti thanks for opening the issue. This may be related to the multiprocessing in the FARMReader causing problems with the multiprocessing in the Inferencer. When initializing the FARMReader could you pass the option num_processes=0? So

reader = FARMReader(model_name_or_path="MODEL_NAME", num_processes=0)

and see if that prevents this thread locking from happening?

@danielbichuetti danielbichuetti changed the title FARMReader thread locking FARMReader parallelism issues Sep 28, 2022
@danielbichuetti
Copy link
Contributor Author

danielbichuetti commented Sep 28, 2022

Hello @sjrl Thank you for your suggestion. It worked. It seems to be not a thread locking but a deadlock because of the multiprocessing.

After the suggested change, all cores were used, and no deadlocks.

@sjrl
Copy link
Contributor

sjrl commented Sep 28, 2022

Hello @sjrl Thank you for your suggestion. It worked. It seems to be not a thread locking but a deadlock because of the multiprocessing.

@vblagoje @Timoeller This sounds like another good reason to remove multiprocessing from the FARMReader.

@danielbichuetti
Copy link
Contributor Author

danielbichuetti commented Sep 28, 2022

Just to keep as a note, after setting num_processes=0, these are the results for

FARMReader

CPU times: user 7min 22s, sys: 1min 5s, total: 8min 27s
Wall time: 1min 6s

TransformersReader

CPU times: user 11min 45s, sys: 9.28 s, total: 11min 54s
Wall time: 1min 30s

This is a good first sight that disabling multiprocessing in FARMReader, its performance is still superior to TransformersReader default setting.

@sjrl
Copy link
Contributor

sjrl commented Sep 28, 2022

@Timoeller @vblagoje Tagging you guys to check out the above timings in regards to issue #3272

@vblagoje
Copy link
Member

@danielbichuetti can you confirm that you used the same models in the readers? I ran tests similar to this one several times, and my measurements indicated TransformersReader is slightly faster (by ~ 15%). I didn't rely on colab but ran the tests directly on bare-metal to minimize discrepancies.

@danielbichuetti
Copy link
Contributor Author

danielbichuetti commented Sep 29, 2022

When I received the notification about the performance on the other topic, I started a new test using your notebook and same dataset. The only difference I could see at first is that I tested with GPU disabled, only CPU working. Tests are being done in an Xeon 8 CPU 56 GB RAM in Azure ML Studio.

As soon the tests finish (I'll run it 10 times, just to be certain), I'll post the update.

UPDATE: I just checked the model when FARMReader outperformed TransformersReader, it was deepset/xlm-roberta-large-squad2 in both readers. One major difference that I noticed is that in my tests the readers received a giant list of Document sent at once, in your notebook you are sending one Document per time. I'll try both ways. It will take a bit longer, but then we can have a better idea.

@danielbichuetti
Copy link
Contributor Author

@vblagoje These are the results (median):

Hardware: 8vCPU Intel® Xeon® Platinum 8370C (Ice Lake) / 56 GB RAM / 400 GB SSD
Platform: Azure ML
Model: deepset/roberta-base-squad2

Individual document, multiple calls, CPU

FARMReader

CPU times: user 2h 52min 5s, sys: 3min 4s, total: 2h 55min 9s
Wall time: 27min 40s

TransformersReader

CPU times: user 1h 46min 6s, sys: 1min 18s, total: 1h 47min 24s
Wall time: 13min 53s

List of documents, one call, CPU

FARMReader

CPU times: user 1h 38min 35s, sys: 18min 41s, total: 1h 57min 16s
Wall time: 15min 32s

TransformersReader

CPU times: user 1h 41min 43s, sys: 1min 15s, total: 1h 42min 59s
Wall time: 13min 16s

However, I got interesting results when running an ExtractiveQAPipeline, using FARMReader (median Wall time: 1m 34s) vs. TransformersReader (median Wall time: 1m 32s). The tests were done using OpenSearchDocumentStore, a BM25Retriever and the Reader. I have run them 10 times, results were similar to my first tests in 4 cases, there were 2 deadlocks where FARMReader stopped using CPU, and 4 runs where TransformersReader was superior. This can be a result off:

  • network instability (DocumentStore is an OpenSearch cluster)
  • cloud instance instability

The deadlock is a rare situation when running FARMReader with num_processes=0, and extremely common when letting it set to default value.

These are the used pipelines:

retriever = BM25Retriever(document_store=doc_store, top_k=20)
reader = FARMReader(model_name_or_path="deepset/xlm-roberta-large-squad2", top_k=2, num_processes=0)
pipe = ExtractiveQAPipeline(reader, retriever)
retriever = BM25Retriever(document_store=doc_store, top_k=20)
reader = TransformersReader(model_name_or_path="deepset/xlm-roberta-large-squad2", top_k=2)
pipe = ExtractiveQAPipeline(reader, retriever)

You can contact me privately, so I can share credentials (it's an internal testing store, in two flavors, giant documents, small documents) to the DocumentStore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic:pipeline type:bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants