train probe per prompt #271

derpyplops · 2023-07-13T14:24:45Z

Solves NOT-291

This is quite a complex change, but this basically aims to train a reporter model per prompt, then evaluate it both on each individual prompt as well as with the mean credence. I should probably add on some tests for the new file structure as well.

the new flag is --probe_per_prompt added on Run

To test you can do like elk elicit gpt2 imdb --num_gpus 2 --probe_per_prompt with and without the flag.
elk eval should also work.

lauritowal

We'll run the sweeps first and see if it improves anything. If yes we can review this + merge it.

derpyplops added 4 commits July 13, 2023 15:27

added cli arg

47bcfb2

refactor reporter training

a50fe57

WIP add multiprobe training

52b1394

multiprobe elicit works

2420ae0

derpyplops force-pushed the not-291-train-probe-per-prompt branch from daec121 to 2420ae0 Compare July 14, 2023 15:40

derpyplops added 15 commits July 14, 2023 16:46

fix pyright

f35626a

implemented multi probe for elicit

898c3f1

undo list

01d5baa

add more types and sorting

7701c29

weird duplicate arg

4310def

resolved circular import

96a3dab

fixed index passing

9c2def0

fixed index passing again

29eeb7f

add assert

f533418

fix prompt index in loading

0f5ce0b

remove redundant method

327d1eb

correctly eval with multiple probes and some renaming

51b7d3c

remove wrong function

75fe560

pyright

1b6757a

pytest

0d6c8b9

derpyplops marked this pull request as ready for review July 20, 2023 15:44

pyright

785537b

derpyplops requested review from norabelrose and lauritowal July 20, 2023 15:57

lauritowal reviewed Jul 31, 2023

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

train probe per prompt #271

train probe per prompt #271

derpyplops commented Jul 13, 2023 •

edited

lauritowal left a comment

train probe per prompt #271

Are you sure you want to change the base?

train probe per prompt #271

Conversation

derpyplops commented Jul 13, 2023 • edited

lauritowal left a comment

Choose a reason for hiding this comment

derpyplops commented Jul 13, 2023 •

edited