Add a calibration error statistic #126

norabelrose · 2023-03-14T07:43:44Z

Created CalibrationError class for computing the expected calibration error based on https://arxiv.org/abs/2012.08668. We use this to compute and log the ECE of the probe in train.py

Depends on #124

…ill equal CCS loss." string to be accurate

tests/test_math.py

AlexTMallen

I looked over the paper and code and this looks good to me

AlexTMallen · 2023-03-15T23:55:59Z

elk/training/reporter.py

@@ -190,7 +199,6 @@ def score(self, labels: Tensor, x_pos: Tensor, x_neg: Tensor) -> EvalResult:
 # makes `num_variants` copies of each label, all within a single


This comment is no longer attached to its code

AlexTMallen · 2023-03-15T23:56:02Z

elk/training/reporter.py

@@ -182,6 +184,13 @@ def score(self, labels: Tensor, x_pos: Tensor, x_neg: Tensor) -> EvalResult:

 pred_probs = self.predict(x_pos, x_neg)


So we are implicitly averaging over all the heads and variants? Also now just looking at this I'm not sure how this works when num_variants>1 and num_heads>1.

No this doesn't average over heads and variants, we actually need to fully support the num_heads > 1 case, we don't right now

Benw8888 and others added 16 commits March 10, 2023 03:49

Added error message for prompt-based loss and num_variants=1

0aa15bf

Added num_variants and ccs_prompt_var error message

d9f83a4

changed prompt_var "Only one variant provided. Prompt variance loss w…

d8b2c8c

…ill equal CCS loss." string to be accurate

changed default loss to ccs

ef68a33

Draft commit

8ea0706

Merge branch 'main' into eigen-reporter

231f4df

Break Reporter into CcsReporter and EigenReporter

38625f3

Fix transpose bug

b90bdae

Auto choose solver for device

ccff004

Initial support for streaming VINC

6b67626

Tests fr streaming VINC

e3c2de1

Fix CcsReporter type check bug

d04cbc6

Add fit_streaming

c4c7772

Platt scaling

d5cb899

Platt scaling by default

72717b6

Add expected calibration error stat

1c01b14

norabelrose requested review from AlexTMallen and Benw8888 March 14, 2023 07:47

thejaminator reviewed Mar 14, 2023

View reviewed changes

tests/test_math.py Outdated Show resolved Hide resolved

norabelrose added 3 commits March 14, 2023 15:51

Remove vestigial 'uniform' binning option

4a59958

Rename confidences -> pred_probs

bddafb6

Merge branch 'main' into calibration

2244b01

AlexTMallen approved these changes Mar 16, 2023

View reviewed changes

norabelrose added 2 commits March 16, 2023 05:45

Merge branch 'main' into calibration

852a3ef

Move comment

f369811

norabelrose merged commit bb8fadf into main Mar 16, 2023

norabelrose deleted the calibration branch March 16, 2023 06:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a calibration error statistic #126

Add a calibration error statistic #126

norabelrose commented Mar 14, 2023 •

edited

Loading

AlexTMallen left a comment

AlexTMallen Mar 15, 2023

norabelrose Mar 16, 2023

AlexTMallen Mar 15, 2023

norabelrose Mar 16, 2023

		@@ -190,7 +199,6 @@ def score(self, labels: Tensor, x_pos: Tensor, x_neg: Tensor) -> EvalResult:
		# makes `num_variants` copies of each label, all within a single

		@@ -182,6 +184,13 @@ def score(self, labels: Tensor, x_pos: Tensor, x_neg: Tensor) -> EvalResult:

		pred_probs = self.predict(x_pos, x_neg)

Add a calibration error statistic #126

Add a calibration error statistic #126

Conversation

norabelrose commented Mar 14, 2023 • edited Loading

AlexTMallen left a comment

Choose a reason for hiding this comment

AlexTMallen Mar 15, 2023

Choose a reason for hiding this comment

norabelrose Mar 16, 2023

Choose a reason for hiding this comment

AlexTMallen Mar 15, 2023

Choose a reason for hiding this comment

norabelrose Mar 16, 2023

Choose a reason for hiding this comment

norabelrose commented Mar 14, 2023 •

edited

Loading