-
Notifications
You must be signed in to change notification settings - Fork 246
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
i'm getting 54 sentences/s on inference with BERT on T4 GPU, is that good? #368
Comments
Hey @renaud Concerning your throughput. Looking at Tanays post it takes 0.1621 seconds for a batch of size 64 to complete on a V100. That would make about 395 samples per sec. And this is done on QA, where the inference is much! more complex (lot of communication between GPU and CPU). Simple text classification should happen faster - my intuitive guess would be by a factor of 2-5 times... Happy to interact here and make the textclassification inference faster together with you! |
I did some speed benchmarking on text classification inference.
So deviding 1000/3.57 we get 280 samples/second for seq len=128 and batch size=30 |
Ok, why talking about intuition when one can also just check. I tested QA inference vs Textclassification: Text Classification on 1000 docs, max seq len 128 Question Answering on 1000 questions , max seq len 128 (doc + question just below 128 tokens) So QA inference seems a bit slower than TextClassification inside FARM (0.4.4) |
closing this now for inactivity, feel free to reopen |
Question
I am getting around 54 sentences/s on inference for text classification.
What do you think? is that good? Does this compare with what you get?
Additional context
The text was updated successfully, but these errors were encountered: