Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finalize 0 and 5-shot evals #46

Merged
merged 8 commits into from
Dec 31, 2022
Merged

Finalize 0 and 5-shot evals #46

merged 8 commits into from
Dec 31, 2022

Conversation

haileyschoelkopf
Copy link
Collaborator

@haileyschoelkopf haileyschoelkopf commented Dec 30, 2022

Closes #16 and #38 .

Finally finishing the 0 and 5 shot evals on the Pythia suite. Ideally finishing and merging by tomorrow, TBD.

TODO before merging:

  • Upload all evals
  • Sanity check final PPL numbers on LAMBADA (TODO: 19m, 125m, 350m models)
  • Create a data.feather file (for use with Igor's lm-plot code) containing all evals
  • Write README section explaining how to use the plotting code + how files are laid out in repo (WIP)
  • Create some sample plots to confirm no evals look suspect/affected by the bug we had
  • Standardize whether we'd like to call e.g. step 71500 evals on 1.3B "step 143000" for the 4M bs models, and rename all JSONs with correct model names + step numbers

Will request your review @StellaAthena when todos are all complete

@haileyschoelkopf haileyschoelkopf marked this pull request as ready for review December 31, 2022 03:15
@haileyschoelkopf
Copy link
Collaborator Author

Leaving README todos and filename updates for a later PR, I'll write a guide to generate plots with @igor0 's code later.

@haileyschoelkopf haileyschoelkopf merged commit 7811832 into main Dec 31, 2022
lintangsutawika pushed a commit that referenced this pull request Jun 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

General zero and few-shot evaluations on model suite
1 participant