OLMES Standard Compliance #948

elronbandel · 2024-06-25T13:04:42Z

OLMES includes the following elements, justified in detail above:

Use test set when available, otherwise validation. Sample 1000 instances if more than 1500
Use specified, exact prompt format (Section 3.1)
Use fixed, curated 5-shot examples (Section 3.2)
- balanced
- verified
Evaluate with both MCF and CF, use the best result (Section 3.4)
- add CF template (choices specified)
- add MCF template (choices and numerals are not specified)
- choose the best between CF and MCF at system level
Follow recommendations for all other evaluation details:
- For MMLU: use macro average (over 57 tasks) rather than micro average (over 14042 instances), following AI@Meta (2024). This better represents the diversity of fields in the dataset, although in practice it does not generally make a big difference (see Figure 7).
- When a model requires it, make sure to add the appropriate token at start of prompt (e.g., Gemma (Gemma Team et al., 2024)).
- When using the “character” normalization forCF, include the leading space in the calculation of answer length.
- Restrict all inputs (with completions) to 2048 tokens for consistency across models
- Use the default model precision when evaluating (i.e., avoid options like load_in_8bit
  unless it produces identical results)
- OLMES uses the standard approach of two newlines to separate each in-context example.
  - (set unitxt format for olmes?)
- Other than the original instruction line for MMLU Hendrycks et al. (2021), we do not add
  any extra instructions. This is in view of previous work finding the subject information from instructions makes little changes to model ranking (Alzahrani et al., 2024), and to reduce additional sources of variation in the prompt.

elronbandel assigned elronbandel and borgr Jun 25, 2024

elronbandel mentioned this issue Jul 21, 2024

Add CloseTextSampler and FixedIndicesSampler #1034

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OLMES Standard Compliance #948

OLMES Standard Compliance #948

elronbandel commented Jun 25, 2024

OLMES Standard Compliance #948

OLMES Standard Compliance #948

Comments

elronbandel commented Jun 25, 2024