Quality of Life Refactor of SAE Lens adding SAE Analysis with HookedSAETransformer and some other breaking changes. #162

jbloomAus · 2024-05-27T17:01:03Z

Description

In this PR, we support HookedSAETransformer (coming over from TransformerLens) and refactor a bunch of stuff internally. I'm sorry this PR is so large and for the number of breaking changes. I'm hoping that the future need for refactors is much smaller. We expect the changes to be fairly superficial / easy to adapt to except for those working on forks. Feel free to reach out for assistance / clarification if you are trying to update a fork. Finally, we think it's likely there are no major regressions or introduced bugs, but the test coverage now looks lower as we're showing over the entire repo and not just the training subpackage.

New Features:

HookedSAE Transformer has been ported from TransformerLens to SAE Lens! (note: use_error_term is a property of SAE not SAEConfig.
I've added more SAEs to from_pretrained (and will likely add more in the coming days).

Breaking changes:

Features we removed:

Recently, we removed training SAEs in parallel with one dataloader. This wasn't very useful in practice and made the code quite nasty in places.
We also recently removed resuming training when SAE training jobs are interrupted. This required we store much larger objects and caused a breakdown in many of the abstractions. I've kept the "save on ctrl-c" functionality because that seems useful. We might restore this in the future.

Renaming:

Classes have been renamed so that we use "SAE" instead of "SparseAutoencoder" wherever possible.
Some modules have been moved around to better reflect dependency structures.
Key config items have been renamed to remove “_point” wherever it is present. This brings naming closer in line with TransformerLens.
Config arguments dtype/device of runners are now strings and only strings. This makes things simpler when dealing with high level APIs / typing. We want cfg classes are always serializable.

Notes:

SAEs now have their own configs. The base SAE config includes the minimal information (hopefully) to run / use the SAE. You can mock this info if loading in your own SAEs but we recommend you find the appropriate values so that the SAE can be used correctly (eg: specifying the context length it was trained on or whether prompts has a bos token prepended).

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

Checklist:

I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
I have not rewritten tests relating to key interfaces which would affect backward compatibility

You have tested formatting, typing and unit tests (acceptance tests not currently in use)

I have run make check-ci to check format and linting. (you can run make format to format code if needed.)

Performance Check.

If you have implemented a training change, please indicate precisely how performance changes with respect to the following metrics:

L0
CE Loss
MSE Loss
Feature Dashboard Interpretability

Please links to wandb dashboards with a control and test group.

… grads, noising

codecov · 2024-05-27T17:05:55Z

Codecov Report

Attention: Patch coverage is 59.28500% with 410 lines in your changes are missing coverage. Please review.

Project coverage is 52.34%. Comparing base (eb9489a) to head (3faeae8).

Files	Patch %	Lines
sae_lens/training/sae.py	0.00%	172 Missing ⚠️
sae_lens/sae_training_runner.py	30.76%	63 Missing ⚠️
sae_lens/training/sae_trainer.py	68.13%	46 Missing and 12 partials ⚠️
sae_lens/sae.py	86.74%	12 Missing and 12 partials ⚠️
sae_lens/training/training_sae.py	88.05%	19 Missing ⚠️
sae_lens/config.py	64.86%	13 Missing ⚠️
sae_lens/analysis/hooked_sae_transformer.py	86.41%	6 Missing and 5 partials ⚠️
sae_lens/analysis/neuronpedia_runner.py	0.00%	11 Missing ⚠️
sae_lens/toolkit/pretrained_sae_loaders.py	57.69%	10 Missing and 1 partial ⚠️
sae_lens/evals.py	66.66%	4 Missing and 3 partials ⚠️
... and 6 more

Additional details and impacted files

@@             Coverage Diff             @@
##             main     #162       +/-   ##
===========================================
- Coverage   67.13%   52.34%   -14.80%     
===========================================
  Files          19       26        +7     
  Lines        1710     2755     +1045     
  Branches      267      462      +195     
===========================================
+ Hits         1148     1442      +294     
- Misses        504     1240      +736     
- Partials       58       73       +15

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

jbloomAus · 2024-05-27T17:09:28Z

I'm leaving this here for a few hours while I dm some people for feedback. This will likely be merged shortly.

ckkissane and others added 30 commits May 23, 2024 12:10

move HookedSAETransformer from TL

b8462ca

add tests

2606e9f

move runners one level up

050f8e6

fix docs name

01b5188

trainer clean up

34d0b2b

create training sae, not fully seperate yet

6cd1a0d

remove accidentally commited notebook

11bd7c8

commit working code in the middle of refactor, more work to do

b55ab79

don't use act layers plural

a2921af

make tutorial not use the activation store

38211c0

moved this file

babbed5

move import of toy model runner

8fbac1d

saes need to store at least enough information to run them

5d57b92

further refactor and add tests

f751634

finish act store device rebase

e01fc20

fix config type not caught by test

ebcc7d2

partial progress, not yet handling error term for hooked sae transformer

efc999c

bring tests in line with trainer doing more work

9f2850a

revert some of the simplification to preserve various features, ghost…

40ef22a

… grads, noising

hooked sae transformer is working

74c0c89

homogenize configs

bbf34d3

re-enable sae compilation

3549db9

remove old file that doesn't belong

14e00f5

include normalize activations in base sae config

22addfe

make sure tutorial works

d861239

don't forget to update pbar

386bc40

rename sparse autoencoder to sae for brevity

574bd37

move non-training specific modules out of training

eab3332

rename to remove _point

cc849a8

first steps towards better docs

31f5080

final cleanup

67efb18

jbloomAus changed the title ~~Quality of Lif Refactor of SAE Lens adding SAE Analysis with HookedSAETransformer and some other breaking changes.~~ Quality of Life Refactor of SAE Lens adding SAE Analysis with HookedSAETransformer and some other breaking changes. May 27, 2024

jbloom-md added 2 commits May 28, 2024 10:53

have ci use same test coverage total as make check-ci

2f6db9e

clean up docs a bit

3faeae8

jbloomAus merged commit e4eaccc into main May 28, 2024
5 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quality of Life Refactor of SAE Lens adding SAE Analysis with HookedSAETransformer and some other breaking changes. #162

Quality of Life Refactor of SAE Lens adding SAE Analysis with HookedSAETransformer and some other breaking changes. #162

jbloomAus commented May 27, 2024

codecov bot commented May 27, 2024 •

edited

Loading

jbloomAus commented May 27, 2024

Quality of Life Refactor of SAE Lens adding SAE Analysis with HookedSAETransformer and some other breaking changes. #162

Quality of Life Refactor of SAE Lens adding SAE Analysis with HookedSAETransformer and some other breaking changes. #162

Conversation

jbloomAus commented May 27, 2024

Description

New Features:

Breaking changes:

Features we removed:

Renaming:

Notes:

Type of change

Checklist:

You have tested formatting, typing and unit tests (acceptance tests not currently in use)

Performance Check.

codecov bot commented May 27, 2024 • edited Loading

Codecov Report

jbloomAus commented May 27, 2024

codecov bot commented May 27, 2024 •

edited

Loading