llama3-base gsm8k score #1896

rangehow · 2024-05-28T08:20:15Z

I met a problem relevant to #1799.
The scores of LLaMA3-8B-base on the GSM8K benchmark are significantly lower than the scores reported by the official sources.

I use gsm8k_cot task in lm_eval_harness get same score 50+ as #1799 (comment) while official reported is 79.6

Any idea about this ?

haileyschoelkopf · 2024-05-28T12:19:23Z

The 79.6 GSM8k number reported by Meta comes from their instruct model. I'm not certain if they've reported "official" GSM8k base model scores

rangehow · 2024-05-29T07:48:12Z

Sorry for not paying attention to the subtitle， yes 79.6 is llama3-8B-it score : )

rangehow changed the title ~~llama3-base gsm8k score se~~ llama3-base gsm8k score May 28, 2024

rangehow closed this as completed May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama3-base gsm8k score #1896

llama3-base gsm8k score #1896

rangehow commented May 28, 2024 •

edited

Loading

haileyschoelkopf commented May 28, 2024

rangehow commented May 29, 2024

llama3-base gsm8k score #1896

llama3-base gsm8k score #1896

Comments

rangehow commented May 28, 2024 • edited Loading

haileyschoelkopf commented May 28, 2024

rangehow commented May 29, 2024

rangehow commented May 28, 2024 •

edited

Loading