Finalize 0 and 5-shot evals #46

haileyschoelkopf · 2022-12-30T04:58:42Z

Closes #16 and #38 .

Finally finishing the 0 and 5 shot evals on the Pythia suite. Ideally finishing and merging by tomorrow, TBD.

TODO before merging:

Upload all evals
Sanity check final PPL numbers on LAMBADA (TODO: 19m, 125m, 350m models)
Create a data.feather file (for use with Igor's lm-plot code) containing all evals
Write README section explaining how to use the plotting code + how files are laid out in repo (WIP)
Create some sample plots to confirm no evals look suspect/affected by the bug we had
Standardize whether we'd like to call e.g. step 71500 evals on 1.3B "step 143000" for the 4M bs models, and rename all JSONs with correct model names + step numbers

Will request your review @StellaAthena when todos are all complete

haileyschoelkopf · 2022-12-31T03:18:09Z

Leaving README todos and filename updates for a later PR, I'll write a guide to generate plots with @igor0 's code later.

Finalize 0 and 5-shot evals

haileyschoelkopf added 8 commits December 29, 2022 23:10

remove renamed folders

2411fac

rename 13B evals

0b44f71

rename 6.7B evals

bcac6d5

remove questionable 1.3B evals

53c3ec2

add amended 1.3B and 1.3B dedup evals

da2c8de

fix 125M and 350M evals

1657543

remove old 125M and 350M evals

70def13

remove old 19M folders

9211de0

haileyschoelkopf marked this pull request as ready for review December 31, 2022 03:15

haileyschoelkopf merged commit 7811832 into main Dec 31, 2022

lintangsutawika pushed a commit that referenced this pull request Jun 19, 2023

Merge pull request #46 from EleutherAI/cleanup_evals

d0c0e23

Finalize 0 and 5-shot evals

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Finalize 0 and 5-shot evals #46

Finalize 0 and 5-shot evals #46

haileyschoelkopf commented Dec 30, 2022 •

edited

Loading

haileyschoelkopf commented Dec 31, 2022

Finalize 0 and 5-shot evals #46

Finalize 0 and 5-shot evals #46

Conversation

haileyschoelkopf commented Dec 30, 2022 • edited Loading

haileyschoelkopf commented Dec 31, 2022

haileyschoelkopf commented Dec 30, 2022 •

edited

Loading