The scores calculated by VLMEvalKit differ from the score calculated on the MMBench website #121

jdy18 · 2024-03-20T05:22:19Z

When testing the results in this table：

The result on the MMbench website

By VLMEvalKit：
"split","Overall","AR","CP","FP-C","FP-S","LR","RR",
"dev","0.7328178694158075","0.7638190954773869","0.8277027027027027","0.6223776223776224","0.7610921501706485","0.4745762711864407","0.7652173913043478"

kennymckormick · 2024-03-20T05:39:34Z

Hi, @jdy18 ,
That looks weird, please share the original prediction with me, thus I can get more information.

jdy18 · 2024-03-20T05:51:37Z

Hi, @jdy18 , That looks weird, please share the original prediction with me, thus I can get more information.

MMBench_DEV_EN.xlsx

jdy18 · 2024-03-20T05:55:36Z

Hi, @jdy18 , That looks weird, please share the original prediction with me, thus I can get more information.

I think I have found the reason. I uploaded the openai_result table to the mmbench website instead of the original file. Do you know what difference between these two files leads to the difference in evaluation results?

kennymckormick · 2024-03-20T06:11:24Z

Oh, you cannot upload the openai_result table for evaluation, it only includes 1-pass for each question, and the corresponding evaluation result is under the VanillaEval setting.

kennymckormick closed this as completed Mar 20, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The scores calculated by VLMEvalKit differ from the score calculated on the MMBench website #121

The scores calculated by VLMEvalKit differ from the score calculated on the MMBench website #121

jdy18 commented Mar 20, 2024

kennymckormick commented Mar 20, 2024

jdy18 commented Mar 20, 2024 •

edited

Loading

jdy18 commented Mar 20, 2024

kennymckormick commented Mar 20, 2024

The scores calculated by VLMEvalKit differ from the score calculated on the MMBench website #121

The scores calculated by VLMEvalKit differ from the score calculated on the MMBench website #121

Comments

jdy18 commented Mar 20, 2024

kennymckormick commented Mar 20, 2024

jdy18 commented Mar 20, 2024 • edited Loading

jdy18 commented Mar 20, 2024

kennymckormick commented Mar 20, 2024

jdy18 commented Mar 20, 2024 •

edited

Loading