-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When using Accelerate for data parallel inference, using different numbers of GPUs results in different results #1719
Comments
It's probably because of #1308. So the fewshot samples used for a particular
|
Hi, thank you for your timely help, which was very helpful! By selecting |
Hi, have you implemented the approach mentioned in #1308? Can you share it? |
Perhaps you can refer to this |
Hi, @haileyschoelkopf Thank you for your awsome open-source work. We have been evaluating using
lm-eval
and noticed that when usingaccelerate
for data parallel inference, the number of GPUs utilized leads to varying results. And the deviation between these results is greater than the stderr (about 0.012x).We have conducted extensive evaluations on Winogrande using the same settings as the Open LLM Leaderboard, with
num_fewshot=5
andbatch_size=1
.Here are the results we obtained:
Script for 5-shot inference with 1 GPUs:
CUDA_VISIBLE_DEVICES=0 accelerate launch -m lm_eval --model hf \ --model_args pretrained=allenai/tulu-2-dpo-7b,trust_remote_code=True,dtype="bfloat16" \ --tasks winogrande \ --num_fewshot 5 \ --batch_size 1
Script for 5-shot inference with 4 GPUs:
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch -m lm_eval --model hf \ --model_args pretrained=allenai/tulu-2-dpo-7b,trust_remote_code=True,dtype="bfloat16" \ --tasks winogrande \ --num_fewshot 5 \ --batch_size 1
We believe this might be due to the
num_fewshot
. When we setnum_fewshot=0
, we obtain stable result: 0.6993.Script for 0-shot inference with 1 GPUs:
CUDA_VISIBLE_DEVICES=0 accelerate launch -m lm_eval --model hf \ --model_args pretrained=allenai/tulu-2-dpo-7b,trust_remote_code=True,dtype="bfloat16" \ --tasks winogrande \ --num_fewshot 0 \ --batch_size 1
Script for 0-shot inference with 4 GPUs:
CUDA_VISIBLE_DEVICES=0,1,2,3 accelerate launch -m lm_eval --model hf \ --model_args pretrained=allenai/tulu-2-dpo-7b,trust_remote_code=True,dtype="bfloat16" \ --tasks winogrande \ --num_fewshot 0 \ --batch_size 1
Our environments:
accelerate=0.27.2 transformers=4.36.2 lm_eval=0.4.0 commit 89618bf8421d27c8cf28004d616b33fc5b305ceb (HEAD -> main, origin/main, origin/HEAD)
Furthermore, we have evaluated on other servers and using the latest version, with similar observations.
Thank you in advance for your assistance!
The text was updated successfully, but these errors were encountered: