Fit reporters on input embeddings as a sanity check #209

norabelrose · 2023-04-22T05:40:56Z

I noticed that on some (model, dataset) pairs the VINC and/or logistic regression AUROC is rather high after the very first layer, which seemed moderately implausible to me. It struck me that we can fit reporters to the input embeddings to sanity check our results. The idea is that if the input embedding AUROC is significantly higher than 0.5 there must be something wrong with the code or the prompt templates or both, since you can't classify a statement as true or false only by looking at its very last token and nothing else.

CLAassistant · 2023-04-22T05:41:07Z

All committers have signed the CLA.

thejaminator · 2023-04-25T14:25:41Z

lgtm!

thejaminator

lgtm

norabelrose added 2 commits April 22, 2023 04:52

Start saving and fitting a reporter to the input embeddings

5ba1ddd

Rename layer 0 to 'input' to make it more clear

51ba54f

norabelrose added 2 commits April 22, 2023 05:48

Actually rename layer 0 correctly

544b485

Handle layer_stride correctly

43da44e

thejaminator self-assigned this Apr 25, 2023

thejaminator approved these changes Apr 25, 2023

View reviewed changes

norabelrose merged commit 3cd45b9 into main Apr 25, 2023
4 checks passed

norabelrose deleted the input-embeddings branch April 25, 2023 16:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fit reporters on input embeddings as a sanity check #209

Fit reporters on input embeddings as a sanity check #209

norabelrose commented Apr 22, 2023

CLAassistant commented Apr 22, 2023 •

edited

thejaminator commented Apr 25, 2023

thejaminator left a comment

Fit reporters on input embeddings as a sanity check #209

Fit reporters on input embeddings as a sanity check #209

Conversation

norabelrose commented Apr 22, 2023

CLAassistant commented Apr 22, 2023 • edited

thejaminator commented Apr 25, 2023

thejaminator left a comment

Choose a reason for hiding this comment

CLAassistant commented Apr 22, 2023 •

edited