Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The scores calculated by VLMEvalKit differ from the score calculated on the MMBench website #121

Closed
jdy18 opened this issue Mar 20, 2024 · 4 comments

Comments

@jdy18
Copy link

jdy18 commented Mar 20, 2024

MMBench_DEV_EN_openai_result.xlsx

When testing the results in this table:

The result on the MMbench website
image

By VLMEvalKit:
"split","Overall","AR","CP","FP-C","FP-S","LR","RR",
"dev","0.7328178694158075","0.7638190954773869","0.8277027027027027","0.6223776223776224","0.7610921501706485","0.4745762711864407","0.7652173913043478"

@kennymckormick
Copy link
Member

Hi, @jdy18 ,
That looks weird, please share the original prediction with me, thus I can get more information.

@jdy18
Copy link
Author

jdy18 commented Mar 20, 2024

Hi, @jdy18 , That looks weird, please share the original prediction with me, thus I can get more information.

MMBench_DEV_EN.xlsx

@jdy18
Copy link
Author

jdy18 commented Mar 20, 2024

Hi, @jdy18 , That looks weird, please share the original prediction with me, thus I can get more information.

I think I have found the reason. I uploaded the openai_result table to the mmbench website instead of the original file. Do you know what difference between these two files leads to the difference in evaluation results?

@kennymckormick
Copy link
Member

Oh, you cannot upload the openai_result table for evaluation, it only includes 1-pass for each question, and the corresponding evaluation result is under the VanillaEval setting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants