Store final layer LM output and record AUROC and acc #165

norabelrose · 2023-04-04T11:29:23Z

Also refactors the logging to use Pandas, and removes Python 3.9 support because I was too lazy to refactor my cool span conversion function to work around 3.9

elk/extraction/extraction.py

ss-waree

I tested and it seems that the LM outputs appear when I used elicit but not when I use eval -- not sure if I'm missing something.

Edit: nevermind this isn't necessary

AlexTMallen · 2023-04-06T18:46:23Z

elk/extraction/extraction.py

- "Encoder-decoder model doesn't have expected get_encoder() method"
- )
-
- model = assert_type(PreTrainedModel, model.get_encoder())


So are we removing support for only using the encoder of encoder-decoder models?

Yes because that complicates things and is not how you actually do inference on encoder-decoder models normally

AlexTMallen · 2023-04-06T18:55:59Z

elk/extraction/extraction.py

+ # TODO: Do something smarter than "rindex" here. Really we want to
+ # get the span of the answer directly from Jinja, but that doesn't
+ # seem possible. This approach may fail for complex templates.
+ answer_start = text.rindex(choice["answer"])


It seems that if answer is the empty string, or the answer appears in the question, this fails. I think this is an important case, as it may be a thing that makes the method work better. Also, if the answer string isn't exactly what's used in the template this would fail (are there other cases?).

I'm wondering if we should store the completion string (the second half of the jinja template) in the prompt dataset, and if it's empty and the user is attempting to use encoder-decoder models, we raise an exception.

I hope it's not necessary to include the answer in the question, because that would slow down inference (since if the answer only appears at the end you can cache the keys and values).

I'm wondering if we should store the completion string (the second half of the jinja template) in the prompt dataset, and if it's empty and the user is attempting to use encoder-decoder models, we raise an exception.

Yes.

In general I'm like, yes, we should do the templates better, but I don't want that to hold up merging this PR

AlexTMallen · 2023-04-06T19:03:50Z

elk/extraction/extraction.py

+ # Only feed question, not the answer, to the encoder for enc-dec models
+ if model.config.is_encoder_decoder:
+ # TODO: Maybe make this more generic for complex templates?
+ text = text[:answer_start].rstrip()


Why do we strip the right here? This kind of assumes the answer string starts with whitespace correct?

No, we don't assume that. It's not the case that the concatenation of the encoder input and the decoder input have to like, make sense or be a full sentence or something. I'm sort of assuming that the answer string does not start with whitespace. And the encoder is not going to expect trailing whitespace in its input (although I doubt it actually matters)

for more information, see https://pre-commit.ci

elk/utils/data_utils.py

AlexTMallen

I reviewed all the changes since my last review and they look good to me

LM output evaluation for autoregressive models

d292c7c

norabelrose requested review from lauritowal, AlexTMallen and ss-waree April 4, 2023 11:29

lauritowal and others added 8 commits April 4, 2023 23:11

move to own baseline file

7ed5ccd

cleanup

ba1d3b2

Support encoder-decoder model LM output

a20d4ca

Merge remote-tracking branch 'origin/main' into lm-output

088758e

isort

77d7418

Bug fixes

5bf63f4

Merge branch 'main' into lm-output

819cfed

Merge branch 'main' into lm-output

d3d9a8d

norabelrose marked this pull request as ready for review April 5, 2023 09:21

norabelrose added 3 commits April 5, 2023 09:25

Remove test_log_csv_elements

b89e23c

Remove Python 3.9 support

9aef842

Add Pandas to pyproject.toml

0851d4f

lauritowal reviewed Apr 5, 2023

View reviewed changes

elk/extraction/extraction.py Outdated Show resolved Hide resolved

add code (contains still same device cuda error)

207a375

ss-waree reviewed Apr 6, 2023

View reviewed changes

AlexTMallen reviewed Apr 6, 2023

View reviewed changes

lauritowal and others added 5 commits April 7, 2023 06:15

fix multiple cuda error, save evals to right folder + cleanup

e7efcce

Merge branch 'main' into eval_lr

b5fa54c

[pre-commit.ci] auto fixes from pre-commit.com hooks

4f8bdc5

for more information, see https://pre-commit.ci

Fix bug noticed by Waree

9ca72ba

Merge remote-tracking branch 'origin/eval_lr' into lm-output

d7e4893

norabelrose mentioned this pull request Apr 7, 2023

Evaluate baseline when running eval #171

Closed

norabelrose added 2 commits April 7, 2023 16:10

Merge remote-tracking branch 'origin/main' into lm-output

bcdca8a

Add sanity check to load_prompts and refactor binarize

713a251

AlexTMallen reviewed Apr 8, 2023

View reviewed changes

elk/utils/data_utils.py Outdated Show resolved Hide resolved

Merge remote-tracking branch 'origin/main' into lm-output

f6a762a

norabelrose added 4 commits April 10, 2023 20:42

Revert changes to binarize

f547744

Stupid prompt_counter bug

ab1909f

Merge remote-tracking branch 'origin/main' into lm-output

f58290f

Remove stupid second set_start_method call

f912ee6

AlexTMallen approved these changes Apr 10, 2023

View reviewed changes

norabelrose merged commit 14a7c2a into main Apr 10, 2023

norabelrose deleted the lm-output branch April 10, 2023 22:17

norabelrose mentioned this pull request Apr 13, 2023

Evaluate models' zero-shot accuracy #76

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Store final layer LM output and record AUROC and acc #165

Store final layer LM output and record AUROC and acc #165

norabelrose commented Apr 4, 2023 •

edited

Loading

ss-waree left a comment •

edited

Loading

AlexTMallen Apr 6, 2023

norabelrose Apr 6, 2023

AlexTMallen Apr 6, 2023

norabelrose Apr 6, 2023

AlexTMallen Apr 6, 2023

norabelrose Apr 6, 2023

AlexTMallen left a comment

Store final layer LM output and record AUROC and acc #165

Store final layer LM output and record AUROC and acc #165

Conversation

norabelrose commented Apr 4, 2023 • edited Loading

ss-waree left a comment • edited Loading

Choose a reason for hiding this comment

AlexTMallen Apr 6, 2023

Choose a reason for hiding this comment

norabelrose Apr 6, 2023

Choose a reason for hiding this comment

AlexTMallen Apr 6, 2023

Choose a reason for hiding this comment

norabelrose Apr 6, 2023

Choose a reason for hiding this comment

AlexTMallen Apr 6, 2023

Choose a reason for hiding this comment

norabelrose Apr 6, 2023

Choose a reason for hiding this comment

AlexTMallen left a comment

Choose a reason for hiding this comment

norabelrose commented Apr 4, 2023 •

edited

Loading

ss-waree left a comment •

edited

Loading