Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quality of Life Refactor of SAE Lens adding SAE Analysis with HookedSAETransformer and some other breaking changes. #162

Merged
merged 33 commits into from
May 28, 2024

Conversation

jbloomAus
Copy link
Owner

Description

In this PR, we support HookedSAETransformer (coming over from TransformerLens) and refactor a bunch of stuff internally. I'm sorry this PR is so large and for the number of breaking changes. I'm hoping that the future need for refactors is much smaller. We expect the changes to be fairly superficial / easy to adapt to except for those working on forks. Feel free to reach out for assistance / clarification if you are trying to update a fork. Finally, we think it's likely there are no major regressions or introduced bugs, but the test coverage now looks lower as we're showing over the entire repo and not just the training subpackage.

New Features:

  • HookedSAE Transformer has been ported from TransformerLens to SAE Lens! (note: use_error_term is a property of SAE not SAEConfig.
  • I've added more SAEs to from_pretrained (and will likely add more in the coming days).

Breaking changes:

Features we removed:

  • Recently, we removed training SAEs in parallel with one dataloader. This wasn't very useful in practice and made the code quite nasty in places.
  • We also recently removed resuming training when SAE training jobs are interrupted. This required we store much larger objects and caused a breakdown in many of the abstractions. I've kept the "save on ctrl-c" functionality because that seems useful. We might restore this in the future.

Renaming:

  • Classes have been renamed so that we use "SAE" instead of "SparseAutoencoder" wherever possible.
  • Some modules have been moved around to better reflect dependency structures.
  • Key config items have been renamed to remove “_point” wherever it is present. This brings naming closer in line with TransformerLens.
  • Config arguments dtype/device of runners are now strings and only strings. This makes things simpler when dealing with high level APIs / typing. We want cfg classes are always serializable.

Notes:

  • SAEs now have their own configs. The base SAE config includes the minimal information (hopefully) to run / use the SAE. You can mock this info if loading in your own SAEs but we recommend you find the appropriate values so that the SAE can be used correctly (eg: specifying the context length it was trained on or whether prompts has a bos token prepended).

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

Checklist:

  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have not rewritten tests relating to key interfaces which would affect backward compatibility

You have tested formatting, typing and unit tests (acceptance tests not currently in use)

  • I have run make check-ci to check format and linting. (you can run make format to format code if needed.)

Performance Check.

If you have implemented a training change, please indicate precisely how performance changes with respect to the following metrics:

  • L0
  • CE Loss
  • MSE Loss
  • Feature Dashboard Interpretability

Please links to wandb dashboards with a control and test group.

@jbloomAus jbloomAus changed the title Quality of Lif Refactor of SAE Lens adding SAE Analysis with HookedSAETransformer and some other breaking changes. Quality of Life Refactor of SAE Lens adding SAE Analysis with HookedSAETransformer and some other breaking changes. May 27, 2024
Copy link

codecov bot commented May 27, 2024

Codecov Report

Attention: Patch coverage is 59.28500% with 410 lines in your changes are missing coverage. Please review.

Project coverage is 52.34%. Comparing base (eb9489a) to head (3faeae8).

Files Patch % Lines
sae_lens/training/sae.py 0.00% 172 Missing ⚠️
sae_lens/sae_training_runner.py 30.76% 63 Missing ⚠️
sae_lens/training/sae_trainer.py 68.13% 46 Missing and 12 partials ⚠️
sae_lens/sae.py 86.74% 12 Missing and 12 partials ⚠️
sae_lens/training/training_sae.py 88.05% 19 Missing ⚠️
sae_lens/config.py 64.86% 13 Missing ⚠️
sae_lens/analysis/hooked_sae_transformer.py 86.41% 6 Missing and 5 partials ⚠️
sae_lens/analysis/neuronpedia_runner.py 0.00% 11 Missing ⚠️
sae_lens/toolkit/pretrained_sae_loaders.py 57.69% 10 Missing and 1 partial ⚠️
sae_lens/evals.py 66.66% 4 Missing and 3 partials ⚠️
... and 6 more
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #162       +/-   ##
===========================================
- Coverage   67.13%   52.34%   -14.80%     
===========================================
  Files          19       26        +7     
  Lines        1710     2755     +1045     
  Branches      267      462      +195     
===========================================
+ Hits         1148     1442      +294     
- Misses        504     1240      +736     
- Partials       58       73       +15     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jbloomAus
Copy link
Owner Author

I'm leaving this here for a few hours while I dm some people for feedback. This will likely be merged shortly.

@jbloomAus jbloomAus merged commit e4eaccc into main May 28, 2024
5 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants