This repository provides the code for training, probing and intervening the Othello-GPT in this paper, to be present at ICLR 2023.
The implementation is based on minGPT, thanks to Andrej Karpathy.
- Installation
- Training Othello-GPT
- Probing Othello-GPT
- Intervening Othello-GPT
- Attribution via Intervention Plots
- How to Cite
Some plotting functions require Latex on your machine: check this FAQ for how to install.
Then use these commands to set up:
conda env create -f environment.yml
conda activate othello
python -m ipykernel install --user --name othello --display-name "othello"
mkdir -p ckpts/battery_othello
Download the championship dataset and the synthetic dataset and save them in data
subfolder.
Then see train_gpt_othello.ipynb
for the training and validation. Alternatively, checkpoints can be downloaded from here to skip this step.
The default experiment setting requires jupyter nbconvert --execute --to notebook --allow-errors --ExecutePreprocessor.timeout=-1 train_gpt_othello.ipynb --inplace --output ckpts/checkpoint.ipynb
to run it in background.
Then we will use train_probe_othello.py
to train probes.
For example, if we want to train a nonlinear probe with hidden size python train_probe_othello.py --layer 6 --twolayer --mid_dim 64 --championship
.
Checkpoints will be saved to ckpts/battery_othello
or can be alternatively downloaded from here. What produces the these checkpoints are produce_probes.sh
.
See intervening_probe_interact_column.ipynb
for the intervention experiment, where we can customize (1) which model to intervene on, (2) the pre-intervention board state (3) which square(s) to intervene.
See plot_attribution_via_intervention_othello.ipynb
for the attribution via intervention experiment, where we can also customize (1) which model to intervene on, (2) the pre-intervention board state (3) which square(s) to attribute.
@inproceedings{
li2023emergent,
title={Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task},
author={Kenneth Li and Aspen K Hopkins and David Bau and Fernanda Vi{\'e}gas and Hanspeter Pfister and Martin Wattenberg},
booktitle={The Eleventh International Conference on Learning Representations },
year={2023},
url={https://openreview.net/forum?id=DeG07_TcZvT}
}