Skip to content
View winnieyangwannan's full-sized avatar

Block or report winnieyangwannan

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Mechanistic Interpretability Visualizations using React

Jupyter Notebook 182 28 Updated Jul 13, 2024

Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"

Python 1 Updated Jul 18, 2024

A curation of awesome tools, documents and projects about LLM Security.

882 86 Updated Aug 29, 2024

A framework for few-shot evaluation of language models.

Python 6,536 1,732 Updated Sep 26, 2024
Python 1 Updated Jul 1, 2024

Representation Engineering: A Top-Down Approach to AI Transparency

Jupyter Notebook 699 81 Updated Aug 14, 2024

Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".

Python 80 18 Updated Aug 27, 2024

Sparse Autoencoder for Mechanistic Interpretability

Python 3 Updated Apr 20, 2024

Training Sparse Autoencoders on Language Models

Jupyter Notebook 381 104 Updated Sep 27, 2024

Sparse Autoencoder for Mechanistic Interpretability

Python 173 38 Updated Jul 20, 2024

Training Sparse Autoencoders on Language Models

HTML 1 Updated Jun 1, 2024
Jupyter Notebook 1 Updated May 27, 2024

LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces

Jupyter Notebook 73 22 Updated Sep 21, 2023
Jupyter Notebook 1 Updated Jun 28, 2024
Python 162 11 Updated Feb 22, 2024

This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking".

Jupyter Notebook 17 2 Updated Mar 21, 2024

A scientific instrument for investigating latent spaces

Jupyter Notebook 552 18 Updated Sep 7, 2024

Code for internal lab sharing - polishing has started but is by no means complete

MATLAB 119 128 Updated Nov 22, 2023

A beautiful, simple, clean, and responsive Jekyll theme for academics

HTML 10,663 10,990 Updated Sep 26, 2024

Hopfield Networks is All You Need

Python 1,666 187 Updated Apr 23, 2023
Python 19 4 Updated Apr 11, 2024

Deep Learning - Spring 2022 - Final Project

Python 1 Updated May 4, 2022