Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster bootstrap for metrics; refactor metric computations into evaluate_preds #197

Merged
merged 13 commits into from
Apr 19, 2023

Conversation

norabelrose
Copy link
Member

@norabelrose norabelrose commented Apr 17, 2023

I realized that our code for computing ROC AUROC, accuracy, and calibrated accuracy were sort of all over the place and there was a decent amount of code duplication. This PR refactors all of that into a single function evaluate_preds which is used for both reporters and logistic regression classifiers, in elicit as well as eval.

Other changes:

  1. Confidence intervals now use the cluster bootstrap, resampling entire groups of prompt templates at a time, to take account of the fact that different variants of the same data point are not IID. This leads to significantly larger CIs than those reported in main.
  2. Partially in order to account for (1), I've increased the default for max_examples from [750, 250] to [1000, 1000]. There's just too much noise in the data with only 250 clusters.
  3. Confidence intervals are now included for accuracy and calibrated accuracy

@norabelrose norabelrose changed the title Refactor metric computations into evaluate_preds Cluster bootstrap for AUROC; refactor metric computations into evaluate_preds Apr 17, 2023
@norabelrose norabelrose changed the title Cluster bootstrap for AUROC; refactor metric computations into evaluate_preds Cluster bootstrap for metrics; refactor metric computations into evaluate_preds Apr 17, 2023
Copy link
Collaborator

@AlexTMallen AlexTMallen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@norabelrose norabelrose merged commit 4d65f9c into main Apr 19, 2023
4 checks passed
@norabelrose norabelrose deleted the metric-refactor branch April 19, 2023 19:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants