i'm getting 54 sentences/s on inference with BERT on T4 GPU, is that good? #368

renaud · 2020-05-19T11:48:23Z

Question

I am getting around 54 sentences/s on inference for text classification.

What do you think? is that good? Does this compare with what you get?

Additional context

lang_model = "bert-base-cased"
do_lower_case = False
max_seq_len = 64
use_amp = None

GCP
1 x T4 GPU
n1-standard-2 (2 vCPUs, 7.5 GB memory)

The text was updated successfully, but these errors were encountered:

Timoeller · 2020-05-25T11:02:17Z

Hey @renaud
we are currently doing a lot of inference benchmarking for Question Answering like described in deepset-ai/haystack/issues/39 where we also compare pytorch vs ONNX.

Concerning your throughput.
I think it is pretty slow - but I am not sure how a T4 GPU performs compared to the V100s we used.
One important paramter is the batch size. Did you test different batch size values?

Looking at Tanays post it takes 0.1621 seconds for a batch of size 64 to complete on a V100. That would make about 395 samples per sec. And this is done on QA, where the inference is much! more complex (lot of communication between GPU and CPU). Simple text classification should happen faster - my intuitive guess would be by a factor of 2-5 times...

Happy to interact here and make the textclassification inference faster together with you!

Timoeller · 2020-06-19T15:51:31Z

I did some speed benchmarking on text classification inference.
I used a V100 GPU and tested various batch size + max_seq_len values with inference on 1000 texts:

seq len 256, (batch size, seconds)
[(1, 15.71), (10, 5.47), (20, 5.31), (30, 5.31)]
seq len 128, (batch size, seconds)
[(1, 14.99), (10, 3.48), (20, 3.73), (30, 3.57)]

So deviding 1000/3.57 we get 280 samples/second for seq len=128 and batch size=30
I would suggest you should try increasing the batch size. A T4 will still be slower than a V100, but 54 samples/s is really low.
And I also realized that I might be wrong about textclassification inference being faster than QA inference - the numbers are comparable to a recent QA inference benchmark test.

Timoeller · 2020-06-19T16:58:10Z

Ok, why talking about intuition when one can also just check.

I tested QA inference vs Textclassification:

Text Classification on 1000 docs, max seq len 128
Batch size: 1, takes 14.947
Batch size: 3, takes 5.801
Batch size: 6, takes 3.904
Batch size: 10, takes 3.771
Batch size: 20, takes 3.758
Batch size: 30, takes 3.667

Question Answering on 1000 questions , max seq len 128 (doc + question just below 128 tokens)
Batch size: 1, takes 16.096
Batch size: 3, takes 6.172
Batch size: 6, takes 5.290
Batch size: 10, takes 4.951
Batch size: 20, takes 5.044
Batch size: 30, takes 4.930

So QA inference seems a bit slower than TextClassification inside FARM (0.4.4)

Timoeller · 2020-07-15T16:22:07Z

closing this now for inactivity, feel free to reopen

renaud added the question Further information is requested label May 19, 2020

Timoeller self-assigned this May 25, 2020

Timoeller closed this as completed Jul 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

i'm getting 54 sentences/s on inference with BERT on T4 GPU, is that good? #368

i'm getting 54 sentences/s on inference with BERT on T4 GPU, is that good? #368

renaud commented May 19, 2020

Timoeller commented May 25, 2020

Timoeller commented Jun 19, 2020

Timoeller commented Jun 19, 2020

Timoeller commented Jul 15, 2020

i'm getting 54 sentences/s on inference with BERT on T4 GPU, is that good? #368

i'm getting 54 sentences/s on inference with BERT on T4 GPU, is that good? #368

Comments

renaud commented May 19, 2020

Timoeller commented May 25, 2020

Timoeller commented Jun 19, 2020

Timoeller commented Jun 19, 2020

Timoeller commented Jul 15, 2020