Skip to content

Commit

Permalink
Improve MMMU performance with prompt engineering (openai#1450)
Browse files Browse the repository at this point in the history
With this improvement we now have a 0-shot performance of 59.6%
(averaged over 3 eval runs) on the MMMU validation set, which beats the
56.8% reported in the [MMMU paper](https://arxiv.org/pdf/2311.16502.pdf)
  • Loading branch information
etr2460 committed Jan 3, 2024
1 parent f1bb7cb commit 2981e65
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion evals/elsuite/mmmu/eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -88,7 +88,7 @@ def eval_sample(self, sample: Sample, rng):
rng=rng,
)
prompt = sample.question + "\n" + options
system_prompt = f'You are an expert in {self.subject} whose job is to answer questions from the user using images. First, reason about the correct answer. Then write the answer in the following format where X is exactly one of A,B,C,D: "ANSWER: X"'
system_prompt = f'You are an expert in {self.subject} whose job is to answer questions from the user using images. First, reason about the correct answer. Then write the answer in the following format where X is exactly one of A,B,C,D: "ANSWER: X". If you are uncertain of the correct answer, guess the most likely one.'
else:
correct_answer = sample.label
prompt = sample.question
Expand Down

0 comments on commit 2981e65

Please sign in to comment.