Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

truthfulqa_mc2 is Nan, while truthfulqa_mc1 is 1.00 #714

Open
chi2liu opened this issue Jul 31, 2023 · 5 comments
Open

truthfulqa_mc2 is Nan, while truthfulqa_mc1 is 1.00 #714

chi2liu opened this issue Jul 31, 2023 · 5 comments

Comments

@chi2liu
Copy link

chi2liu commented Jul 31, 2023

I have finetued a model based on llama-2-hf, and run the evaluation with code and get truthfulqa_mc2 is Nan, while truthfulqa_mc1 is 1.00.

What does that means?

python main.py --model hf-causal-experimental --model_args pretrained=../mamba-gpt-7b-v2 --tasks anli_r1,anli_r2,anli_r3,arc_challenge,arc_easy,boolq,hellaswag,openbookqa,piqa,record,rte,truthfulqa_mc,wic,winogrande --device cuda:0

hf-causal-experimental (pretrained=../mamba-gpt-7b-v2), limit: None, provide_description: False, num_fewshot: 0, batch_size: None

Task Version Metric Value Stderr
anli_r1 0 acc 0.3340 ± 0.0149
anli_r2 0 acc 0.3340 ± 0.0149
anli_r3 0 acc 0.3350 ± 0.0136
arc_challenge 0 acc 0.2270 ± 0.0122
acc_norm 0.2270 ± 0.0122
arc_easy 0 acc 0.2508 ± 0.0089
acc_norm 0.2508 ± 0.0089
boolq 1 acc 0.3783 ± 0.0085
hellaswag 0 acc 0.2504 ± 0.0043
acc_norm 0.2504 ± 0.0043
openbookqa 0 acc 0.2760 ± 0.0200
acc_norm 0.2760 ± 0.0200
piqa 0 acc 0.4951 ± 0.0117
acc_norm 0.4951 ± 0.0117
record 0 f1 0.1186 ± 0.0032
em 0.1151 ± 0.0032
rte 0 acc 0.5271 ± 0.0301
truthfulqa_mc 1 mc1 1.0000 ± 0.0000
mc2 NaN ± NaN
wic 0 acc 0.5000 ± 0.0198
winogrande 0 acc 0.4957 ± 0.0141
@505707566
Copy link

I have same issue!But I have done some operation to change or move the lora weight in my code.
Have you solved it?

I have finetued a model based on llama-2-hf, and run the evaluation with code and get truthfulqa_mc2 is Nan, while truthfulqa_mc1 is 1.00.

What does that means?

python main.py --model hf-causal-experimental --model_args pretrained=../mamba-gpt-7b-v2 --tasks anli_r1,anli_r2,anli_r3,arc_challenge,arc_easy,boolq,hellaswag,openbookqa,piqa,record,rte,truthfulqa_mc,wic,winogrande --device cuda:0

hf-causal-experimental (pretrained=../mamba-gpt-7b-v2), limit: None, provide_description: False, num_fewshot: 0, batch_size: None

Task Version Metric Value Stderr
anli_r1 0 acc 0.3340 ± 0.0149
anli_r2 0 acc 0.3340 ± 0.0149
anli_r3 0 acc 0.3350 ± 0.0136
arc_challenge 0 acc 0.2270 ± 0.0122
acc_norm 0.2270 ± 0.0122
arc_easy 0 acc 0.2508 ± 0.0089
acc_norm 0.2508 ± 0.0089
boolq 1 acc 0.3783 ± 0.0085
hellaswag 0 acc 0.2504 ± 0.0043
acc_norm 0.2504 ± 0.0043
openbookqa 0 acc 0.2760 ± 0.0200
acc_norm 0.2760 ± 0.0200
piqa 0 acc 0.4951 ± 0.0117
acc_norm 0.4951 ± 0.0117
record 0 f1 0.1186 ± 0.0032
em 0.1151 ± 0.0032
rte 0 acc 0.5271 ± 0.0301
truthfulqa_mc 1 mc1 1.0000 ± 0.0000
mc2 NaN ± NaN
wic 0 acc 0.5000 ± 0.0198
winogrande 0 acc 0.4957 ± 0.0141

@lintangsutawika
Copy link
Contributor

This issue should be solved in the main branch.

@hahmad2008
Copy link

@lintangsutawika I used the main branch and the issue is still there
issue opened #1340

@choco9966
Copy link

choco9966 commented Apr 23, 2024

@lintangsutawika How to fix it? Can you share the PR? Thanks

@haileyschoelkopf
Copy link
Contributor

@choco9966 can you share a public model + sample command that reproduces this issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants