-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error when evaluating mmbench_dev_en:The correct answer according to the 'answer' field in the table should be D, but the log says it is A. #73
Comments
Hi, @jdy18 , |
Thank you for you attention. I have also encountered an issue when evaluating the MME: it incorrectly interprets response as "unknown" if the answer contains words like "notice", even in a context where the response is affirmative, such as 'Yes, the image is a photo of Friedhof Wilmersdorf. The photograph depicts a gravestone adorned with a death notice and an emblem.' This error arises from the presence of 'no' within 'notice' |
Hi, @jdy18 , |
Thank you for your prompt attention and efforts to address the issues I've raised. Your responsiveness and dedication to improving the system are truly appreciated. However, I'd like to suggest a couple of enhancements to further refine the evaluation process for multi-choice tasks: It might be beneficial to implement a mechanism for exact matching of uppercase option letters in multi-choice questions. This could help avoid confusion caused by the presence of quantifiers like "a" in responses, which might be mistakenly interpreted as indicating multiple choices. Additionally, in cases where multiple letters or multiple instances of "yes"/"no" appear, the system could prioritize the analysis of the first word in the sentence to determine the intended response. I am also curious about whether the scores currently displayed on the OpenCompass leaderboard have been updated to reflect these latest modifications. Could you provide any information on this? |
The correct answer according to the 'answer' field in the table should be D, but the log says it is A.
The text was updated successfully, but these errors were encountered: