Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output logits differ for the same input text in a batch of size 1 with half precision on GPU #28732

Closed
2 of 4 tasks
zhukpm opened this issue Jan 26, 2024 · 3 comments
Closed
2 of 4 tasks

Comments

@zhukpm
Copy link

zhukpm commented Jan 26, 2024

System Info

Linux 20.04.1-Ubuntu x86_64 GNU/Linux
Python 3.10.12
transformers==4.37.1
torch==2.1.2+cu121

GPU A100
NVIDIA-SMI 525.147.05 Driver Version: 525.147.05 CUDA Version: 12.0

Who can help?

@ArthurZucker

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

We run an inference with a CausalLM model, providing the same text, but in different batches. One of the batches is of size 1, and the other - of size > 1. Output logits differ slightly for the same input sequence.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed


# MODEL_ID = 'mistralai/Mistral-7B-Instruct-v0.2'
MODEL_ID = 'facebook/opt-350m'


model = AutoModelForCausalLM.from_pretrained(
    MODEL_ID,
    torch_dtype=torch.bfloat16,
    device_map='auto',
    return_dict=True
)
model.eval()

tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
tokenizer.pad_token = tokenizer.eos_token
model.config.pad_token_id = tokenizer.pad_token_id

batches = [
    ['hello, world'],
    ['hello, world', 'hello', 'world']
]

tokenized = [tokenizer(b, padding='longest', return_tensors='pt').to(model.device) for b in batches]

assert (tokenized[0]['input_ids'][0] == tokenized[1]['input_ids'][0]).all().item()

set_seed(0)
with torch.inference_mode():
    logits = [model(**t).logits for t in tokenized]

assert torch.allclose(logits[0][0], logits[1][0], atol=1e-3)

Expected behavior

Output logits should be the same (at least very close to other) regardless of the batch size.
Note that we observe this problem only with torch.float16 and torch.bfloat16 on GPUs.

The code above works without errors

  • on CPUs
  • when using float32
  • when comparing batches of sizes e.g. 2 and 3:
batches = [
    ['hello, world', 'hello'],
    ['hello, world', 'hello', 'world']
]

So for some reason the problem occurs for half precision and batch_size=1 only.

I think that this thread might be related somehow, but I'm not sure.

@ArthurZucker
Copy link
Collaborator

This seems like a duplicate of #25420 (comment)

@zhukpm
Copy link
Author

zhukpm commented Jan 29, 2024

Yeah, it is. Thanks

Copy link

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants