Skip to content

Latest commit

 

History

History
13 lines (10 loc) · 1.48 KB

README.md

File metadata and controls

13 lines (10 loc) · 1.48 KB

Caikit NLP Runtime Performance Benchmarks

Runtime performance benchmarking results for various model on various hardware configurations.

Llama2-7b

Date Executed Hardware Training Set Epoch Precision Batch Size Max Source Length Training Runtime (s) Samples Per Second Train Steps Per Second Loss Notes
2023-09-05 1 x A100 80GB Glue / RTE 1 bfloat16 6 4096 350 21.325 0.22 1.65 4096 is the context size for Llama2
2023-09-05 1 x A100 80GB Glue / RTE 1 bfloat16 6 1024 350 21.333 0.22 1.65 batch size of 7 fails CUDA OOM
2023-09-06 1 x A100 80GB Glue / RTE 1 bfloat16 6 512 348 21.44 0.22 1.65 batch size of 7 fails CUDA OOM
2023-09-05 1 x A100 80GB Glue / RTE 1 bfloat16 8 256 356 20.939 0.16 1.70 batch size of 9 fails CUDA OOM
2023-09-05 1 x A100 80GB Glue / RTE 1 bfloat16 19 128 254 29.332 0.09 1.94 batch size of 20 fails CUDA OOM