Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

llama3-base gsm8k score #1896

Closed
rangehow opened this issue May 28, 2024 · 2 comments
Closed

llama3-base gsm8k score #1896

rangehow opened this issue May 28, 2024 · 2 comments

Comments

@rangehow
Copy link

rangehow commented May 28, 2024

I met a problem relevant to #1799.
The scores of LLaMA3-8B-base on the GSM8K benchmark are significantly lower than the scores reported by the official sources.

I use gsm8k_cot task in lm_eval_harness get same score 50+ as #1799 (comment) while official reported is 79.6
image

Any idea about this ?

@rangehow rangehow changed the title llama3-base gsm8k score se llama3-base gsm8k score May 28, 2024
@haileyschoelkopf
Copy link
Contributor

The 79.6 GSM8k number reported by Meta comes from their instruct model. I'm not certain if they've reported "official" GSM8k base model scores

@rangehow
Copy link
Author

Sorry for not paying attention to the subtitle, yes 79.6 is llama3-8B-it score : )

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants