Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

i'm getting 54 sentences/s on inference with BERT on T4 GPU, is that good? #368

Closed
renaud opened this issue May 19, 2020 · 4 comments
Closed
Assignees
Labels
question Further information is requested

Comments

@renaud
Copy link
Contributor

renaud commented May 19, 2020

Question

I am getting around 54 sentences/s on inference for text classification.

What do you think? is that good? Does this compare with what you get?

Additional context

lang_model = "bert-base-cased"
do_lower_case = False
max_seq_len = 64
use_amp = None
  • GCP
  • 1 x T4 GPU
  • n1-standard-2 (2 vCPUs, 7.5 GB memory)
@renaud renaud added the question Further information is requested label May 19, 2020
@Timoeller Timoeller self-assigned this May 25, 2020
@Timoeller
Copy link
Contributor

Hey @renaud
we are currently doing a lot of inference benchmarking for Question Answering like described in deepset-ai/haystack/issues/39 where we also compare pytorch vs ONNX.

Concerning your throughput.
I think it is pretty slow - but I am not sure how a T4 GPU performs compared to the V100s we used.
One important paramter is the batch size. Did you test different batch size values?

Looking at Tanays post it takes 0.1621 seconds for a batch of size 64 to complete on a V100. That would make about 395 samples per sec. And this is done on QA, where the inference is much! more complex (lot of communication between GPU and CPU). Simple text classification should happen faster - my intuitive guess would be by a factor of 2-5 times...

Happy to interact here and make the textclassification inference faster together with you!

@Timoeller
Copy link
Contributor

I did some speed benchmarking on text classification inference.
I used a V100 GPU and tested various batch size + max_seq_len values with inference on 1000 texts:

  • seq len 256, (batch size, seconds)
    [(1, 15.71), (10, 5.47), (20, 5.31), (30, 5.31)]
  • seq len 128, (batch size, seconds)
    [(1, 14.99), (10, 3.48), (20, 3.73), (30, 3.57)]

So deviding 1000/3.57 we get 280 samples/second for seq len=128 and batch size=30
I would suggest you should try increasing the batch size. A T4 will still be slower than a V100, but 54 samples/s is really low.
And I also realized that I might be wrong about textclassification inference being faster than QA inference - the numbers are comparable to a recent QA inference benchmark test.

@Timoeller
Copy link
Contributor

Ok, why talking about intuition when one can also just check.

I tested QA inference vs Textclassification:

Text Classification on 1000 docs, max seq len 128
Batch size: 1, takes 14.947
Batch size: 3, takes 5.801
Batch size: 6, takes 3.904
Batch size: 10, takes 3.771
Batch size: 20, takes 3.758
Batch size: 30, takes 3.667

Question Answering on 1000 questions , max seq len 128 (doc + question just below 128 tokens)
Batch size: 1, takes 16.096
Batch size: 3, takes 6.172
Batch size: 6, takes 5.290
Batch size: 10, takes 4.951
Batch size: 20, takes 5.044
Batch size: 30, takes 4.930

So QA inference seems a bit slower than TextClassification inside FARM (0.4.4)

@Timoeller
Copy link
Contributor

closing this now for inactivity, feel free to reopen

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants