-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
For some models and prompts, the loglikelihood changes with the batch size. #704
Comments
Thanks so much for a really thorough writeup of this! It's really appreciated. I'll see if I can find a more self-contained explanation/reference to point to for why this is the case, but I think it is unfortunately expected on GPU, as certain sums will get executed in different orders and accumulate small error because of the non-associativity of floating point ops. This is something we might be able to improve (but likely not fully fix) if we throw in a |
I think this problem is in the underlying
transformers
library, but I'm creating an issue here to document the behavior as it results in inconsistent evaluation scores. This issue was encountered in #695.To reproduce:
model:
pretrained=EleutherAI/pythia-160m
context:
The SWAT team moved in on the compound to prevent the terrorists from launching a deadly missile because the terrorists
continuation:
were trying to terrorize the global population.
A batch of four requests
Returns
While a batch of five requests
Returns
The same issue is also present for
model:
pretrained=facebook/opt-125m
context:
Bush beat Gore because Gore
continuation:
was unpopular.
The problem also manifests when running the following command with different tasks:
python main.py --model hf --model_args pretrained=EleutherAI/pythia-160m --tasks $TASKS --batch_size 32
When
TASKS=xwinograd_en,xwinograd_fr,xwinograd_jp,xwinograd_pt,xwinograd_ru,xwinograd_zh
When
TASKS=xwinograd_en
The
xwinograd_en
task has different scores between runs, presumably because of incidental differences in batching.The text was updated successfully, but these errors were encountered: