![Screenshot 2024-03-21 at 3 08 28 pm](https://private-user-images.githubusercontent.com/69127271/315377789-209012ec-a779-4036-b4be-7b7739ea87f6.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTg5OTc1MDEsIm5iZiI6MTcxODk5NzIwMSwicGF0aCI6Ii82OTEyNzI3MS8zMTUzNzc3ODktMjA5MDEyZWMtYTc3OS00MDM2LWI0YmUtN2I3NzM5ZWE4N2Y2LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjElMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIxVDE5MTMyMVomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPTIxN2I0YjJhMWNkZDQzODFkN2M4ZDJkZDBiMjk3ZTcyZGU3ZjBjNzI0YjQ5NTIwMjFkZjQ1ZWVlZTkyZmRkN2YmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Ov1YgIAYRGAv40IFCH-VueG0X-ohLT8UjOzgX9OXlE4)
SAELens exists to help researchers:
- Train sparse autoencoders.
- Analyse sparse autoencoders / research mechanistic interpretability.
- Generate insights which make it easier to create safe and aligned AI systems.
Please refer to the documentation for information on how to:
- Download and Analyse pre-trained sparse autoencoders.
- Train your own sparse autoencoders.
- Generate feature dashboards with the SAE-Vis Library.
SAE Lens is the result of many contributors working collectively to improve humanities understanding of neural networks, many of whom are motivated by a desire to safeguard humanity from risks posed by artificial intelligence.
This library is maintained by Joseph Bloom and David Chanin.
- Loading and Analysing Pre-Trained Sparse Autoencoders
- Understanding SAE Features with the Logit Lens
- Training a Sparse Autoencoder
Feel free to join the Open Source Mechanistic Interpretability Slack for support!
Research:
Reference Implementations: