You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
We run an inference with a CausalLM model, providing the same text, but in different batches. One of the batches is of size 1, and the other - of size > 1. Output logits differ slightly for the same input sequence.
Output logits should be the same (at least very close to other) regardless of the batch size.
Note that we observe this problem only with torch.float16 and torch.bfloat16 on GPUs.
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
System Info
Linux 20.04.1-Ubuntu x86_64 GNU/Linux
Python 3.10.12
transformers==4.37.1
torch==2.1.2+cu121
GPU A100
NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0
Who can help?
@ArthurZucker
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
We run an inference with a CausalLM model, providing the same text, but in different batches. One of the batches is of size
1
, and the other - of size> 1
. Output logits differ slightly for the same input sequence.Expected behavior
Output logits should be the same (at least very close to other) regardless of the batch size.
Note that we observe this problem only with
torch.float16
andtorch.bfloat16
on GPUs.The code above works without errors
float32
So for some reason the problem occurs for half precision and
batch_size=1
only.I think that this thread might be related somehow, but I'm not sure.
The text was updated successfully, but these errors were encountered: