Use concept-erasure implementation of LEACE and SAL #252

norabelrose · 2023-06-08T01:48:38Z

Now that concept-erasure is on PyPI, we can outsource our ConceptEraser implementation to that repo.

This PR makes LEACE, rather than SAL, the default method for pseudolabel and prompt template normalization. I should probably add a config option to change it though.

lauritowal · 2023-06-11T17:02:59Z

elk/training/ccs_reporter.py

@@ -265,12 +265,12 @@ def fit(self, hiddens: Tensor) -> float:
 self.norm.update(
 x=x_neg,
 # Independent indicator for each (template, pseudo-label) pair
- y=torch.cat([torch.zeros_like(prompt_ids), prompt_ids], dim=-1),
+ z=torch.cat([torch.zeros_like(prompt_ids), prompt_ids], dim=-1),


fixed; replaced y with z for ccs

AlexTMallen

LGTM!

AlexTMallen · 2023-06-15T04:27:40Z

elk/training/eigen_reporter.py

nice :) no need for pseudolabels at inference time

lauritowal · 2023-07-09T19:56:36Z

elk/evaluation/evaluate.py

@@ -40,8 +39,7 @@ def apply_to_layer(
 experiment_dir = elk_reporter_dir() / self.source

 reporter_path = experiment_dir / "reporters" / f"layer_{layer}.pt"
- reporter = Reporter.load(reporter_path, map_location=device)
- reporter.eval()


Isn't the eval() here still neded?

No because CcsReporter doesn't actually have any submodules like nn.BatchNorm or nn.Dropout whose behavior changes due to eval()

artkpv · 2023-07-26T17:06:51Z

JFI, my probes / reporters now won't load with this PR because I used Reporter.load. https://github.com/EleutherAI/elk/pull/252/files#diff-d08b84a509f043deeb98c9c642f692fffbd1967486738d2ff242b7897eb0b1ae

norabelrose · 2023-07-26T23:58:51Z

JFI, my probes / reporters now won't load with this PR because I used Reporter.load. https://github.com/EleutherAI/elk/pull/252/files#diff-d08b84a509f043deeb98c9c642f692fffbd1967486738d2ff242b7897eb0b1ae

Sorry about that, we can't really guarantee backward compatibility at this point. You should be able to load the reporters with an older commit and extract the raw weights if necessary.

Use concept-erasure implementation of LEACE and SAL

dc2cc49

norabelrose requested review from AlexTMallen and lauritowal June 8, 2023 01:48

fix parameter name in ccs

0a70094

lauritowal approved these changes Jun 11, 2023

View reviewed changes

norabelrose added 3 commits June 14, 2023 01:10

Fix test failures

280343c

Merge branch 'leace' of github.com:EleutherAI/elk into leace

703844c

Be picky about the concept-erasure version

fac6247

AlexTMallen approved these changes Jun 15, 2023

View reviewed changes

elk/training/eigen_reporter.py Outdated

Copy link

Collaborator

AlexTMallen Jun 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice :) no need for pseudolabels at inference time

norabelrose added 3 commits July 6, 2023 03:42

Merge remote-tracking branch 'origin/main' into leace

0f6f120

Refactor to support concept-erasure v0.1

0f8d0a1

Fix test failure

3db2cc8

norabelrose requested a review from lauritowal July 6, 2023 20:34

lauritowal reviewed Jul 9, 2023

View reviewed changes

norabelrose merged commit a88c01a into main Jul 10, 2023
4 checks passed

norabelrose deleted the leace branch July 10, 2023 05:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use concept-erasure implementation of LEACE and SAL #252

Use concept-erasure implementation of LEACE and SAL #252

norabelrose commented Jun 8, 2023

lauritowal Jun 11, 2023

AlexTMallen left a comment

AlexTMallen Jun 15, 2023

lauritowal Jul 9, 2023

norabelrose Jul 10, 2023

artkpv commented Jul 26, 2023

norabelrose commented Jul 26, 2023

Use concept-erasure implementation of LEACE and SAL #252

Use concept-erasure implementation of LEACE and SAL #252

Conversation

norabelrose commented Jun 8, 2023

lauritowal Jun 11, 2023

Choose a reason for hiding this comment

AlexTMallen left a comment

Choose a reason for hiding this comment

AlexTMallen Jun 15, 2023

Choose a reason for hiding this comment

lauritowal Jul 9, 2023

Choose a reason for hiding this comment

norabelrose Jul 10, 2023

Choose a reason for hiding this comment

artkpv commented Jul 26, 2023

norabelrose commented Jul 26, 2023