Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FARMReader slow #1077

Closed
bappctl opened this issue May 19, 2021 · 31 comments
Closed

FARMReader slow #1077

bappctl opened this issue May 19, 2021 · 31 comments
Assignees

Comments

@bappctl
Copy link

bappctl commented May 19, 2021

Question
I am running one of the samples in K8 pod (gpu) It get stuck in FARMReader for long (30+ mins) and time out. Any reason? All i added was 2 .txt document

    reader = FARMReader(model_name_or_path="deepset/roberta-base-squad2",
                            use_gpu=True, return_no_answer=True, no_ans_boost=0.7, context_window_size=200)

    retriever = ElasticsearchRetriever(document_store= document_store)
     
    pipe = ExtractiveQAPipeline(reader, retriever)
    
    # predict n answers
     prediction = pipe.run(query=question, top_k_retriever=10, top_k_reader=3)

y[```
2021-05-19 23:34:10 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:8)
05/19/2021 23:34:10 - INFO - farm.infer - Got ya 23 parallel workers to do inference ...
05/19/2021 23:34:10 - INFO - farm.infer - 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
05/19/2021 23:34:10 - INFO - farm.infer - /w\ /w\ /w\ /w\ /w\ /w\ /w\ /|\ /w\ /w\ /w\ /w\ /w\ /w\ /|\ /w\ /|\ /|\ /|\ /|\ /w\ /w\ /|
05/19/2021 23:34:10 - INFO - farm.infer - /'\ / \ /'\ /'\ / \ / \ /'\ /'\ /'\ /'\ /'\ /'\ / \ /'\ /'\ / \ /'\ /'\ /'\ /'\ / \ / \ /'
05/19/2021 23:34:10 - INFO - farm.infer -
05/19/2021 23:34:10 - INFO - elasticsearch - POST http:https://10.x.x.x:8071/sidx/_search [status:200 request:0.003s]
05/19/2021 23:34:10 - WARNING - farm.data_handler.dataset - Could not determine type for feature 'labels'. Converting now to a tensor of default type long.
05/19/2021 23:34:10 - WARNING - farm.data_handler.dataset - Could not determine type for feature 'labels'. Converting now to a tensor of default type long.
[2021-05-19 23:34:40 +0000] [8] [WARNING] Worker graceful timeout (pid:8)
[2021-05-19 23:34:42 +0000] [8] [INFO] Worker exiting (pid: 8)


@tholor
Copy link
Member

tholor commented May 21, 2021

Hey @bappctl,

  • How long were your two txt documents?
  • I assume you are running this on CPU nodes in k8?
  • What happened before 2021-05-19 23:34:10 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:8)?

One thing that you can try to narrow down the root cause: Disable the multiprocessing for inference. You can do that via
passing num_processes=0 to the FARMReader.

As a side note: With only two docs the retriever is basically useless as top_k_retriever=10 will always return your two documents.

@bappctl
Copy link
Author

bappctl commented May 25, 2021

@tholor

  • Just a one pager document (two of them)
  • Running in GPU nodes
  • It times out in FARM Reader
  • Agree on side note, it's next step after seeing overcoming above issue

TransformersReader works but FARMReader couldn't get it to work. In above code I made num_processes=0 as suggested. But it gets stuck here for almost 40mins (had to kill the process)

image

I notice a peculiar behavior with FARMReader, when i kill the pod then in container log I see the correct result getting printed before app exit, if I don't kill the pod it gets stuck as mentioned above.

image

I see this behaviour only with FARMReader if I switch to TransformersReader apps works fine as expected

@bappctl
Copy link
Author

bappctl commented May 26, 2021

@tholor With TransformersReader how to train on custom data? (similar to FARMReader train())

@tholor
Copy link
Member

tholor commented May 26, 2021

@oryx1729 have you seen such an issue in our kubernetes deployment or have an idea what might cause the deadlock here?

@bappctl
Copy link
Author

bappctl commented May 26, 2021

@tholor Finally made it work. It's something to do with gunicorn threads. When I faced issue I had it set to to 3 threads (very minimum) now I removed it and went with workers and worker-connections then it started working with FARMReader. But threads didn't cause any issue with TransformersReader it happens only with FARMReader.

@oryx1729
Copy link
Contributor

Hi @bappctl, can you share the parameters that you're using for running Gunicorn? We have been using this for our deployments. It might be possible that you need a higher timeout value here?

@bappctl
Copy link
Author

bappctl commented May 26, 2021

Hi @oryx1729
[PREDICT]
Even with below gunicorn config I see issues - first request returns fine with predict, then subsequent request it stalls and get stuck

    reader = FARMReader(model_name_or_path="distilbert-base-uncased-distilled-squad", use_gpu=True, num_processes=0)
    pipe = ExtractiveQAPipeline(reader, retriever)
    prediction = pipe.run(query=question, top_k_retriever=10, top_k_reader=3)

The deploy pod has 4 CPUs, 10GB RAM, 1 GPU. Just tried with 2 documents no more than 2 pages.

CMD ["gunicorn", "--name", "hs", "--timeout", "1800", "--workers", "5", "--worker-connections","2","--worker-class", "gevent", "--bind", ":8061", "main:app"]

The other thing I notice the GPU memory is not freed after predict.

image

[TRAIN]
In parallel I have no luck with FARMReader.train() it stalls for ever.

 reader = FARMReader(model_name_or_path="distilbert-base-uncased-distilled-squad", use_gpu=True, num_processes=0)
 reader.train(data_dir=model, train_filename="squad.json", n_epochs=20, dev_split = 0, save_dir=model)

image

@oryx1729
Copy link
Contributor

@bappctl are you using FastAPI for the APIs? In that case, the Gunicorn worker class should be uvicorn.workers.UvicornWorker.

Can you share the complete code for your API endpoint?

@bappctl
Copy link
Author

bappctl commented May 28, 2021

@oryx1729

I am not using FastAPI. No luck with both train and predict

import json
import os
import config
import logging
from flask_cors import CORS
from flask import Flask, request, jsonify
from haystack import Finder
from haystack.preprocessor.cleaning import clean_wiki_text
from haystack.preprocessor.utils import convert_files_to_dicts
from haystack.reader.farm import FARMReader
from haystack.document_store.elasticsearch import ElasticsearchDocumentStore
from haystack.file_converter.pdf import PDFToTextConverter
from haystack.retriever.dense import DensePassageRetriever
from haystack.retriever.sparse import ElasticsearchRetriever
from haystack.pipeline import ExtractiveQAPipeline
from haystack.reader.transformers import TransformersReader
from haystack.utils import print_answers

app = Flask(__name__)
CORS(app)

@app.route('haystack/es', methods=['POST'])
def es_store():
    if request.files:
        index = request.form['index']
        doc = request.files["doc"]
        eshost = request.form['host']
        esport = request.form['port']
        local_dir = '/home/data'

        file_path = os.path.join(local_dir, doc.filename)
        doc.save(file_path)
        document_store = ElasticsearchDocumentStore(host=eshost, port=esport, username='', password='', index=index)
        dicts = convert_files_to_dicts(
            app.config["input"],
            clean_func=clean_wiki_text,
            split_paragraphs=False)
        document_store.write_documents(dicts)
        os.remove(file_path)
        return json.dumps({'code':200,'status':'success','message': 'File uploaded.'})   
    else:
        return json.dumps({'status':'Failed','message': 'File upload failed.'})

@app.route('/haystack/train', methods=['POST'])
def train():
    local_dir = '/home/data'
    reader = FARMReader(model_name_or_path="distilbert-base-uncased-distilled-squad", use_gpu=True, num_processes=0)
    reader.train(data_dir=local_dir, train_filename="squad.json", n_epochs=20, dev_split = 0, save_dir=local_dir)
    return json.dumps({'code':'200','status':'success','message': 'Train successful.'})

@app.route('/haystack/predict', methods=['POST'])
def predict():
    question = request.form['question']
    index = request.form['index']
    eshost = request.form['host']
    esport = request.form['port']
    document_store = ElasticsearchDocumentStore(host=eshost, port=esport, username='', password='', index=index)
    retriever = ElasticsearchRetriever(document_store= document_store)
    reader = FARMReader(model_name_or_path="distilbert-base-uncased-distilled-squad",  use_gpu=True)
    pipe = ExtractiveQAPipeline(reader, retriever)
    prediction = pipe.run(query=question, top_k_retriever=10, top_k_reader=3)
    answer = []
    for res in prediction['answers']:
        answer.append(res)
    return json.dumps({'code':200,'status':'success','message': 'Predict successful.', 'result': answer})   
    
if __name__ == '__main__':
    app.run(host='0.0.0.0', port=8061)    

@oryx1729
Copy link
Contributor

oryx1729 commented May 28, 2021

Hi @bappctl, in the /haystack/predict, the model is reloaded each time a question is being asked. It would be more efficient to declare the pipeline outside the predict() method, so as to load it only once when the Flask app starts. Something like this:

document_store = ElasticsearchDocumentStore(host=eshost, port=esport, username='', password='', index=index)
retriever = ElasticsearchRetriever(document_store= document_store)
reader = FARMReader(model_name_or_path="distilbert-base-uncased-distilled-squad",  use_gpu=True)
pipe = ExtractiveQAPipeline(reader, retriever)
    
@app.route('/haystack/predict', methods=['POST'])
def predict():
    question = request.form['question']
    index = request.form['index']
    eshost = request.form['host']
    esport = request.form['port']
    prediction = pipe.run(query=question, top_k_retriever=10, top_k_reader=3)
    answer = []
    for res in prediction['answers']:
        answer.append(res)
    return json.dumps({'code':200,'status':'success','message': 'Predict successful.', 'result': answer})   

@bappctl
Copy link
Author

bappctl commented May 28, 2021

@oryx1729
irrespective the same code (even when model called every predict) is not stuck with TransformersReader. I will give a try with your suggestion.

@bappctl
Copy link
Author

bappctl commented May 29, 2021

@oryx1729
As a first step I am just trying to save the model to local and load from there. Below is what i get, anything missing I need to do. Doing nothing special very straight forward.

docker
CMD ["gunicorn", "--name", "haystack", "--timeout", "1800", "--workers", "5", "--worker-connections","2","--worker-class", "gevent", "--bind", ":8091", "main:app"]

main.py

    reader = FARMReader(model_name_or_path="distilbert-base-uncased-distilled-squad", use_gpu=True, num_processes=0)
    reader.train(data_dir=modeldir, train_filename="squad.json", n_epochs=20, dev_split = 0, save_dir=modeldir)

05/29/2021 21:19:32 - INFO - farm.utils - Using device: CUDA
05/29/2021 21:19:32 - INFO - farm.utils - Number of GPUs: 1
05/29/2021 21:19:32 - INFO - farm.utils - Distributed Training: False
05/29/2021 21:19:32 - INFO - farm.utils - Automatic Mixed Precision: None
05/29/2021 21:19:32 - INFO - filelock - Lock 139883311981008 acquired on /root/.cache/huggingface/transformers/ab70e5f489e00bb2df55e4bae145e9b1c7dc794cfa0fd8228e1299d400613429.f3874c2af5400915dc843c97f502c5d30edc728e5ec3b60c4bd6958e87970f75.lock
Downloading: 100%|██████████| 451/451 [00:00<00:00, 793kB/s]
05/29/2021 21:19:33 - INFO - filelock - Lock 139883311981008 released on /root/.cache/huggingface/transformers/ab70e5f489e00bb2df55e4bae145e9b1c7dc794cfa0fd8228e1299d400613429.f3874c2af5400915dc843c97f502c5d30edc728e5ec3b60c4bd6958e87970f75.lock
05/29/2021 21:19:34 - INFO - filelock - Lock 139879304556368 acquired on /root/.cache/huggingface/transformers/b00ff18397f70f871bd8f11949a3c5ffd5fb18fd6d4e3df947dc386950b8d59d.69a963759b72d26fb77afa9b7d43c9107b99dfe7ca78af52e0237c8d001c7dcf.lock
Downloading: 100%|██████████| 265M/265M [00:25<00:00, 10.5MB/s]
05/29/2021 21:20:00 - INFO - filelock - Lock 139879304556368 released on /root/.cache/huggingface/transformers/b00ff18397f70f871bd8f11949a3c5ffd5fb18fd6d4e3df947dc386950b8d59d.69a963759b72d26fb77afa9b7d43c9107b99dfe7ca78af52e0237c8d001c7dcf.lock
05/29/2021 21:20:40 - INFO - filelock - Lock 139879093416848 acquired on /root/.cache/huggingface/transformers/e12f02d630da91a0982ce6db1ad595231d155a2b725ab106971898276d842ecc.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock
Downloading: 100%|██████████| 232k/232k [00:00<00:00, 1.93MB/s]
05/29/2021 21:20:41 - INFO - filelock - Lock 139879093416848 released on /root/.cache/huggingface/transformers/e12f02d630da91a0982ce6db1ad595231d155a2b725ab106971898276d842ecc.d789d64ebfe299b0e416afc4a169632f903f693095b4629a7ea271d5a0cf2c99.lock
05/29/2021 21:20:41 - INFO - filelock - Lock 139879090118160 acquired on /root/.cache/huggingface/transformers/475d46024228961ca8770cead39e1079f135fd2441d14cf216727ffac8d41d78.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock
Downloading: 100%|██████████| 466k/466k [00:00<00:00, 2.22MB/s]
05/29/2021 21:20:42 - INFO - filelock - Lock 139879090118160 released on /root/.cache/huggingface/transformers/475d46024228961ca8770cead39e1079f135fd2441d14cf216727ffac8d41d78.7f2721073f19841be16f41b0a70b600ca6b880c8f3df6f3535cbc704371bdfa4.lock
05/29/2021 21:20:42 - WARNING - farm.utils - ML Logging is turned off. No parameters, metrics or artifacts will be logged to MLFlow.
05/29/2021 21:20:42 - INFO - farm.utils - Using device: CUDA
05/29/2021 21:20:42 - INFO - farm.utils - Number of GPUs: 1
05/29/2021 21:20:42 - INFO - farm.utils - Distributed Training: False
05/29/2021 21:20:42 - INFO - farm.utils - Automatic Mixed Precision: None

[2021-05-29 21:26:50 +0000] [1] [INFO] Handling signal: term
[2021-05-29 21:26:51 +0000] [9] [INFO] Worker exiting (pid: 9)
[2021-05-29 21:26:51 +0000] [10] [INFO] Worker exiting (pid: 10)
[2021-05-29 21:26:51 +0000] [35] [INFO] Worker exiting (pid: 35)
[2021-05-29 21:26:51 +0000] [8] [INFO] Worker exiting (pid: 8)
[2021-05-29 21:27:21 +0000] [1] [INFO] Shutting down: Master


If reduce to 1 worker

[2021-05-29 22:10:44 +0000] [1] [CRITICAL] WORKER TIMEOUT (pid:8)
05/29/2021 22:10:44 - INFO - farm.infer - Got ya 23 parallel workers to do inference ...
[2021-05-29 22:10:45 +0000] [1] [WARNING] Worker with pid 8 was terminated due to signal 9
[2021-05-29 22:10:45 +0000] [97] [INFO] Booting worker with pid: 97
05/29/2021 22:10:46 - INFO - faiss.loader - Loading faiss with AVX2 support.
05/29/2021 22:10:46 - INFO - faiss.loader - Loading faiss.
05/29/2021 22:10:46 - INFO - farm.modeling.prediction_head - Better speed can be achieved with apex installed from https://www.github.com/nvidia/apex .


@tholor tholor assigned oryx1729 and unassigned tholor Jun 1, 2021
@oryx1729
Copy link
Contributor

Hi @bappctl, apologies for the delay in getting back on this.

A few observations in the case above:

  • --workers set to 5 might spin up 5 instances of the model, so ideally you might want to start first with 1 and see if everything works
  • in the case of running with 1 worker, I see an error in the logs: Worker with pid 8 was terminated due to signal 9. It could be possible that the pod is running out of memory? Can you ensure that sufficient memory resources are available?
  • the logs say INFO - farm.infer - Got ya 23 parallel workers to do inference ..., however, in the code snippet, the FARMReader has num_processes=0? Could it be that possible that you were looking at the wrong logs or perhaps a different version of the code was deployed?
  • is there a specific reason that you want to use Flask over FastAPI? There's a rest_api module in Haystack written with FastAPI. You could extend/adapt it for your use case?

@bappctl
Copy link
Author

bappctl commented Jun 26, 2021

@oryx1729 No worries. I tried with 1 worker too. It didn't help. After failed tries, I dismantled that instance and moved on unfortunately I couldn't verify it now. I will take a look into the last option, there is no preference flask or fast api at my end, all I wanted is to get it running successfully. So far no luck. I will give another try and update you.

@bappctl
Copy link
Author

bappctl commented Jul 16, 2021

@oryx1729

went with your fastapi suggestion . It works.

I have couple of questions

  1. does FarmReader train() auto save the model or should .save() be explicitly invoked to save the trained model?
  2. Is there a way to instantiate ElasticsearchDocumentStore without index and then assign index later?
  3. after few sequential predict queries the GPU freezes, is haystack not releasing the memory after successful query completion?
    get [CRITICAL] WORKER TIMEOUT (pid:8)
    [WARNING] Worker with pid 8 was terminated due to signal 9
    BUT if make num_process=0 I don't hit the issue. Even a small increase of num_process to 2 cause issue (subsequent query doesn't get processed stuck in sending request until request cancelled and resent again)

@oryx1729
Copy link
Contributor

Hi @bappctl,

does FarmReader train() auto save the model or should .save() be explicitly invoked to save the trained model?

save() is called inside the train(), so you do not need an explicit call.

Is there a way to instantiate ElasticsearchDocumentStore without index and then assign index later?

Can you provide more details of the use case here? An instance of ElasticsearchDocumentStore must have an index assigned. Based on the create_index parameter, it will try to create a new index or connect to an existing one.

after few sequential predict queries the GPU freezes, is haystack not releasing the memory after successful query completion?

It seems like you're running into memory issues with using multiprocessing(by setting num_processes > 0). You could try increasing the memory if possible, or, use num_process=0.

@bappctl
Copy link
Author

bappctl commented Jul 28, 2021

@oryx1729
Ignore my question on index. It's no more relevant.

The other question I have is sometimes during training for some reason the process get killed and model save() fails and I run into below issue and no more it trains and throws same error.

OSError: Unable to load weights from pytorch checkpoint file for language_model.bin

if I replace the language_model.bin with the old file the error goes. But it's not a right approach. How to overcome it.

@oryx1729
Copy link
Contributor

Hi @bappctl, can you share the full error stack trace that you get when the process is killed? What version of PyTorch are use using?

@bappctl
Copy link
Author

bappctl commented Jul 28, 2021

@oryx1729

pytorch 1.7.1 + cu110

----- error stack ---

04:58:03 +0000] [282] [ERROR] Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.7/dist-packages/uvicorn/protocols/http/httptools_impl.py", line 398, in run_asgi
    result = await app(self.scope, self.receive, self.send)
  File "/usr/local/lib/python3.7/dist-packages/uvicorn/middleware/proxy_headers.py", line 45, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.7/dist-packages/fastapi/applications.py", line 199, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.7/dist-packages/starlette/applications.py", line 112, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.7/dist-packages/starlette/middleware/errors.py", line 181, in __call__
    raise exc from None
  File "/usr/local/lib/python3.7/dist-packages/starlette/middleware/errors.py", line 159, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.7/dist-packages/starlette/middleware/cors.py", line 78, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.7/dist-packages/starlette/exceptions.py", line 82, in __call__
    raise exc from None
  File "/usr/local/lib/python3.7/dist-packages/starlette/exceptions.py", line 71, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.7/dist-packages/starlette/routing.py", line 580, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.7/dist-packages/starlette/routing.py", line 241, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.7/dist-packages/starlette/routing.py", line 52, in app
    response = await func(request)
  File "/usr/local/lib/python3.7/dist-packages/fastapi/routing.py", line 202, in app
    dependant=dependant, values=values, is_coroutine=is_coroutine
  File "/usr/local/lib/python3.7/dist-packages/fastapi/routing.py", line 148, in run_endpoint_function
    return await dependant.call(**values)
  File "/app/controller/train.py", line 109, in _start_train
    reader = FARMReader(model_name_or_path=model_path)
  File "/usr/local/lib/python3.7/dist-packages/haystack/reader/farm.py", line 112, in __init__
    strict=False)
  File "/usr/local/lib/python3.7/dist-packages/farm/infer.py", line 252, in load
    model = BaseAdaptiveModel.load(load_dir=model_name_or_path, device=device, strict=strict)
  File "/usr/local/lib/python3.7/dist-packages/farm/modeling/adaptive_model.py", line 53, in load
    model = cls.subclasses["AdaptiveModel"].load(**kwargs)
  File "/usr/local/lib/python3.7/dist-packages/farm/modeling/adaptive_model.py", line 338, in load
    language_model = LanguageModel.load(load_dir)
  File "/usr/local/lib/python3.7/dist-packages/farm/modeling/language_model.py", line 142, in load
    language_model = cls.subclasses[config["name"]].load(pretrained_model_name_or_path)
  File "/usr/local/lib/python3.7/dist-packages/farm/modeling/language_model.py", line 830, in load
    distilbert.model = DistilBertModel.from_pretrained(farm_lm_model, config=config, **kwargs)
  File "/usr/local/lib/python3.7/dist-packages/transformers/modeling_utils.py", line 1208, in from_pretrained
    f"Unable to load weights from pytorch checkpoint file for '{pretrained_model_name_or_path}' "
OSError: Unable to load weights from pytorch checkpoint file for '/model/language_model.bin' at '/model/language_model.bin'If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True. 

@oryx1729
Copy link
Contributor

Hi @bappctl, could it be possible that you're using an older PyTorch version? Can you try with torch v1.8.1?

@bappctl
Copy link
Author

bappctl commented Jul 29, 2021

@oryx1729 I don't see it happening frequently if I reduce the # workers to 1. Though not frequent at times I see the process get killed during save. I can try with 1.8.1

@oryx1729
Copy link
Contributor

oryx1729 commented Jul 30, 2021

Hi @bappctl, model training is a long-running task, so doing it within the REST API is not a recommended approach.

Could you share more details about the use case for triggering model training via an API?

An alternate approach is to train the model with a separate script & later use the trained model for inference with the API.

@ZanSara
Copy link
Contributor

ZanSara commented Oct 14, 2021

Closing as it seems that the issue was solved using PyTorch 1.8.1. Feel free to open a new issue if you still face problems.

@ZanSara ZanSara closed this as completed Oct 14, 2021
@clharman
Copy link

clharman commented Sep 22, 2022

Hey, @tholor loving the FARMReader interface. However, for a single prediction, I'm seeing FARMReader being ~6x slower than both TransformersReader and Huggingface QA pipeline with num_processes=0 or 1, and ~7.5x slower with num_processes=None. Is there something obvious I'm missing here? Should we expect inference time parity?

image

Using the latest for farm-haystack and transformers. Pytorch==1.12.1 Colab notebook: https://colab.research.google.com/drive/1DmbqWaFw9U4NLzn2dI_u1ypGScKdrGqp?usp=sharing

@tholor
Copy link
Member

tholor commented Sep 23, 2022

Hey @clharman,

We'd expect some time diff as the two readers have quite different postprocessing (e.g. tokenizers, handling no_answers and aggregating logits across multiple passages; see docs for more infos).

However, the diff here is totally out of the expected range and unacceptable.
I tried to reproduce it in your notebook. For me, the diff is similar in the very first execution of the FARMReader cell, but diminishes in the second execution. Basically, suggesting that there's an initial warmup step in the FARMReader costing more time. Can you confirm that you see the same behaviour?

image

There's still a diff between both readers, but this doesn't seem like a "critical bug" for me and might rather be the topic for some thorough profiling + refactoring.

@cc ZanSara

@clharman
Copy link

Thanks for the followup @tholor. When I ran with a GPU I saw results matching yours, with about roughly 1.5-2x slowdown.

However, running the notebook CPU only (which is how I was doing it originally) the 6x slowdown was persistent after any warmup period.

Also, leaving num_processes=None on GPU seems to make the gap even wider -- I'm seeing 16x slower. Just FYI, I've been keeping multiprocessing turned off but thought it was weird.

@tholor
Copy link
Member

tholor commented Sep 27, 2022

ok, thanks for the clarification! the diff on CPU might be related to multiprocessing.

@ZanSara @vblagoje can one of you please take over here and try to replicate this? If the gap is consistently that huge on CPU it might make sense to open a new issue about it.

@vblagoje Weren't you investigating to get rid of multiprocessing anyway?

@vblagoje
Copy link
Member

vblagoje commented Sep 27, 2022

Yes @tholor we got rid of it in preprocessing via #3089 and now pending for inferencing via #3283 For inferencing there is some internal discussion to leave multiprocessing after all via non-default option. Thoughts?

@clharman
Copy link

FYI in case it wasn't clear -- the ~6x slowdown I measured was with multiprocessing turned off (num_processes=0 or 1). Maybe your change affects more than just that argument though. I also tried some basic profiling and found that all the compute time was happening during inference, rather than postprocessing.
image

@sjrl
Copy link
Contributor

sjrl commented Oct 5, 2022

Hi, @clharman we believe we resolved the difference in time between the two reader types by finishing this PR #3283.

There is also further discussion on this in issues #3289 and #3272.

@vblagoje
Copy link
Member

vblagoje commented Oct 5, 2022

Hey @clharman also please have a look at this Colab notebook I am not getting such high variance in performance results (cpu only). Let us know either way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants