winnieyangwannan

Wannan Yang winnieyangwannan

4 followers · 4 following

Stars

TransformerLensOrg / CircuitsVis

Mechanistic Interpretability Visualizations using React

Jupyter Notebook 182 28 Updated Jul 13, 2024

winnieyangwannan / sae-transfer

Forked from ckkissane/sae-transfer

Code to reproduce key results accompanying "SAEs (usually) Transfer Between Base and Chat Models"

Python 1 Updated Jul 18, 2024

winnieyangwannan / jailbreak_dynamics

Forked from s-ball-10/jailbreak_dynamics

Python 1 Updated Jun 13, 2024

corca-ai / awesome-llm-security

A curation of awesome tools, documents and projects about LLM Security.

882 86 Updated Aug 29, 2024

EleutherAI / lm-evaluation-harness

A framework for few-shot evaluation of language models.

Python 6,536 1,732 Updated Sep 26, 2024

winnieyangwannan / URIAL

Forked from Re-Align/URIAL

Python 1 Updated Jul 1, 2024

andyzoujm / representation-engineering

Representation Engineering: A Top-Down Approach to AI Transparency

Jupyter Notebook 699 81 Updated Aug 14, 2024

andyrdt / refusal_direction

Code and results accompanying the paper "Refusal in Language Models Is Mediated by a Single Direction".

Python 80 18 Updated Aug 27, 2024

HuFY-dev / sparse_autoencoder

Forked from ai-safety-foundation/sparse_autoencoder

Sparse Autoencoder for Mechanistic Interpretability

Python 3 Updated Apr 20, 2024

jbloomAus / SAELens

Training Sparse Autoencoders on Language Models

Jupyter Notebook 381 104 Updated Sep 27, 2024

ai-safety-foundation / sparse_autoencoder

Sparse Autoencoder for Mechanistic Interpretability

Python 173 38 Updated Jul 20, 2024

winnieyangwannan / SAELens

Forked from jbloomAus/SAELens

Training Sparse Autoencoders on Language Models

HTML 1 Updated Jun 1, 2024

winnieyangwannan / llam_101

Jupyter Notebook 1 Updated May 27, 2024

nrimsky / LM-exp

LLM experiments done during SERI MATS - focusing on activation steering / interpreting activation spaces

Jupyter Notebook 73 22 Updated Sep 21, 2023

winnieyangwannan / entity_tracking_update

Jupyter Notebook 1 Updated Jun 28, 2024

davidbau / baukit

Python 162 11 Updated Feb 22, 2024

Nix07 / finetuning

This repository contains the code used for the experiments in the paper "Fine-Tuning Enhances Existing Mechanisms: A Case Study on Entity Tracking".

Jupyter Notebook 17 2 Updated Mar 21, 2024

enjalot / latent-scope

A scientific instrument for investigating latent spaces

Jupyter Notebook 552 18 Updated Sep 7, 2024

buzsakilab / buzcode

Code for internal lab sharing - polishing has started but is by no means complete

MATLAB 119 128 Updated Nov 22, 2023

alshedivat / al-folio

A beautiful, simple, clean, and responsive Jekyll theme for academics

HTML 10,663 10,990 Updated Sep 26, 2024

winnieyangwannan / Selection-of-experience-for-memory-by-hippocampal-sharp-wave-ripples

Python 2 Updated Mar 14, 2024

ml-jku / hopfield-layers

Hopfield Networks is All You Need

Python 1,666 187 Updated Apr 23, 2023

sunchipsster1 / ConSpec

Python 19 4 Updated Apr 11, 2024

pazvives / DL-SP22-Project

Deep Learning - Spring 2022 - Final Project

Python 1 Updated May 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly