Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
2 fixes:
forward
method, whenself.use_error_term
is True, for non-standard architectures things like centering and input activation functions weren't being applied correctly. I think this was just a mistake, and is now fixed.encode_gated
method, the hookhook_sae_acts_post
pointed to the activations post-ReLU (or whatever activation function is), but not to the actual output of this function, i.e. post-ReLU activations multiplied by masking values. I think a long-term solution has 2 separate hooks for each of these (e.g.hook_sae_acts_post
andhook_sae_mag_post
), but if we just have a single hook calledhook_sae_acts_post
then I think it makes a lot more sense for it to refer to the output of the encoder.