Fix currently uploaded eval-harness numbers for 1.3B ; 6.7B #37

haileyschoelkopf · 2022-12-20T05:19:00Z

Currently some of the 0 and 5 shot evals I ran appear to be wrong. (the 6.7B and 1.3B evals, for sure.) Not sure what went wrong but rerunning is quick.

I'll pull the ones that may be bad from the repo asap! We'll need to rerun these.

StellaAthena · 2022-12-27T18:57:20Z

Crossposting from the Discord for transparency: we found a bug in our code introduced by a new feature we added between the training and evaluation of the models. It has been corrected but many, if not all, of the evaluations need to be rerun.

haileyschoelkopf · 2022-12-27T23:46:22Z

Yup! I believe all but 1.3B are now corrected, though I need to do another pass through them all and delete bad ones. Should have those soon, the cluster has just been giving me trouble with freezes + not accepting my jobs for some reason.

Also todo:

Create .feather file + instructions to use for plots with Igor's code
Rename lambada task to lambada_openai everywhere?

haileyschoelkopf · 2023-01-01T17:47:29Z

All evals in this repo should be correct now, as far as I know!

haileyschoelkopf added the bug Something isn't working label Dec 20, 2022

StellaAthena assigned StellaAthena and haileyschoelkopf Dec 27, 2022

haileyschoelkopf closed this as completed Jan 1, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix currently uploaded eval-harness numbers for 1.3B ; 6.7B #37

Fix currently uploaded eval-harness numbers for 1.3B ; 6.7B #37

haileyschoelkopf commented Dec 20, 2022

StellaAthena commented Dec 27, 2022

haileyschoelkopf commented Dec 27, 2022

haileyschoelkopf commented Jan 1, 2023

Fix currently uploaded eval-harness numbers for 1.3B ; 6.7B #37

Fix currently uploaded eval-harness numbers for 1.3B ; 6.7B #37

Comments

haileyschoelkopf commented Dec 20, 2022

StellaAthena commented Dec 27, 2022

haileyschoelkopf commented Dec 27, 2022

haileyschoelkopf commented Jan 1, 2023