Blazing fast bootstrap stderrs for AUROC #190

norabelrose · 2023-04-15T22:11:51Z

Adds bootstrap standard errors everywhere we report AUROC figures, fixing #116.

Doing this the naive way with sklearn.metrics.roc_auc_score turned out to be quite slow (over a full second on CPU for each layer). Luckily GPT-4 helped me write a custom PyTorch implementation of AUROC that supports batching, so the computation can be vectorized across all the bootstrap resampled datasets at once. The relevant functions are roc_auc and roc_auc_ci in elk.metrics. Even when run on the CPU, roc_auc_ci is much faster (~20x) than the naive for-loop baseline; on GPU it's of course even faster than that. Basically the bootstrap CI is no longer a significant bottleneck, so you might as well use roc_auc_ci wherever you want to compute an AUROC.

This PR does depend on #179 even though it probably doesn't need to, because I was too lazy to rebase. I'm hoping #179 will get merged today anyway so it won't matter.

As a bonus, this PR allows us to get rid of our dependency on sklearn, although we do still need it as a [dev] dependency for the tests.

for more information, see https://pre-commit.ci

remove typo

fix typo

norabelrose and others added 30 commits April 4, 2023 11:28

LM output evaluation for autoregressive models

d292c7c

move to own baseline file

7ed5ccd

cleanup

ba1d3b2

Support encoder-decoder model LM output

a20d4ca

Merge remote-tracking branch 'origin/main' into lm-output

088758e

isort

77d7418

Bug fixes

5bf63f4

Merge branch 'main' into lm-output

819cfed

Merge branch 'main' into lm-output

d3d9a8d

Remove test_log_csv_elements

b89e23c

Remove Python 3.9 support

9aef842

Add Pandas to pyproject.toml

0851d4f

add code (contains still same device cuda error)

207a375

fix multiple cuda error, save evals to right folder + cleanup

e7efcce

Merge branch 'main' into eval_lr

b5fa54c

[pre-commit.ci] auto fixes from pre-commit.com hooks

4f8bdc5

for more information, see https://pre-commit.ci

Fix bug noticed by Waree

9ca72ba

Merge remote-tracking branch 'origin/eval_lr' into lm-output

d7e4893

Merge remote-tracking branch 'origin/main' into lm-output

bcdca8a

Add sanity check to load_prompts and refactor binarize

713a251

Changing a ton of stuff

0c35bc7

Merge remote-tracking branch 'origin/main' into lm-output

f6a762a

Revert changes to binarize

f547744

Stupid prompt_counter bug

ab1909f

Merge remote-tracking branch 'origin/main' into lm-output

f58290f

Remove stupid second set_start_method call

f912ee6

Merge remote-tracking branch 'origin/lm-output' into multiclass

606dcad

Merge remote-tracking branch 'origin/main' into multiclass

0038792

Fix bugs in binary case

83b480b

Various little refactors

3e66262

norabelrose and others added 20 commits April 13, 2023 06:06

Refactor handling of multiple datasets

da4c72f

Various fixes

e1675f7

Merge remote-tracking branch 'origin/main' into multi-ds-eval

8cc325b

Fix math tests

14987e1

Fix smoke tests

88683fa

All tests working ostensibly

a6c382e

Make CCS normalization customizable

ecc53cb

log each dataset individually

18c7f4c

Merge branch 'multi-ds-eval' into multiclass

94a900c

Fix label_column bug

5173649

GLUE MNLI works on Deberta

3e6c39c

Move pseudo AUROC stuff to CcsReporter

1e9ce06

Make 'datasets' and 'label_columns' config options more opinionated

35a8f34

tiny spacing change

615bbb1

Allow for toggling CV

f021404

Merge branch 'multi-ds-eval' into multiclass

f6629ec

Remove duplicate dbpedia template

99f01c3

Merge branch 'main' into multiclass

f415f8d

Training on datasets with different numbers of classes now works

d16c96b

Efficient bootstrap CIs for AUROCs

044774e

norabelrose requested review from lauritowal and AlexTMallen April 15, 2023 22:11

norabelrose and others added 4 commits April 15, 2023 22:15

Fix CCS smoke test failure

a7f1ea0

Update extraction.py

3abeb60

remove typo

Merge branch 'main' into roc_auc

1e4a6b9

Update extraction.py

4c60061

fix typo

lauritowal approved these changes Apr 16, 2023

View reviewed changes

norabelrose merged commit 7b4a00c into main Apr 16, 2023

norabelrose deleted the roc_auc branch April 16, 2023 22:42

norabelrose mentioned this pull request May 19, 2023

Bootstrap CIs for AUROC metrics #116

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blazing fast bootstrap stderrs for AUROC #190

Blazing fast bootstrap stderrs for AUROC #190

norabelrose commented Apr 15, 2023 •

edited

Loading

Blazing fast bootstrap stderrs for AUROC #190

Blazing fast bootstrap stderrs for AUROC #190

Conversation

norabelrose commented Apr 15, 2023 • edited Loading

norabelrose commented Apr 15, 2023 •

edited

Loading