Raw extraction #200

AlexTMallen · 2023-04-18T05:46:40Z

extracts hiddens without applying templates or making contrast tuples
can be used with eval by specifying a magic dataset “raw” and including --data_dir
- doesn't support few-shot examples, yes balancing by default (though optional for everything now), no streaming (enforced in PromptConfig's __post_init__)
Add support for inference without contrast tuples in Reporter
- renaming score to score_contrast_tuple
- I'm not sure if I should just make them be the same function and do different things depending on the shape of the input
Columns of provided dataset in --data_dir must contain string “text” and binary “label”, and it shouldn't have any splits
In this mode the LM total logprob assigned to the text is also computed
- That way you can perform ~whatever analyses you want by defining the input dataset and reading the output CSV
- I prepend tokenizer.bos_token to the input so that I can compute this. Will this always work and be in distribution?
Adds base_fingerprint argument to the builder which reads the fingerprint of the raw dataset to improve caching as the raw datasets are modified
Adds support for saving the predictions to an output directory with --preds_out_dir

for more information, see https://pre-commit.ci

norabelrose · 2023-04-18T06:01:29Z

Won't merge as-is; let's talk about ways to accomplish a similar goal within the templates system perhaps

AlexTMallen added 10 commits April 13, 2023 04:20

raw extraction support in eval; train still not implemented

430ca1e

fix logprobs

547c698

Merge branch 'main' into raw-extraction

947eb1e

fix caching of raw datasets

56cf0b4

ostensibly working raw inference

ffb7260

add --skip_balance flag

d983bd4

remove num_classes from load_prompts args

62a23a2

add hiddens_out_dir arg; modify raw caching slightly

c617d6a

add option to save reporter & lr outputs

e986195

Merge branch 'main' into raw-extraction

ad4cf34

AlexTMallen requested a review from norabelrose April 18, 2023 05:46

[pre-commit.ci] auto fixes from pre-commit.com hooks

4394bf9

for more information, see https://pre-commit.ci

norabelrose closed this Apr 21, 2023

Provide feedback