Othello World

This repository provides the code for training, probing and intervening the Othello-GPT in this paper, to be present at ICLR 2023.
The implementation is based on minGPT, thanks to Andrej Karpathy.

Installation

Some plotting functions require Latex on your machine: check this FAQ for how to install.
Then use these commands to set up:

conda env create -f environment.yml
conda activate othello
python -m ipykernel install --user --name othello --display-name "othello"
mkdir -p ckpts/battery_othello

Training Othello-GPT

Download the championship dataset and the synthetic dataset and save them in data subfolder.
Then see train_gpt_othello.ipynb for the training and validation. Alternatively, checkpoints can be downloaded from here to skip this step.
The default experiment setting requires $8$ GPU's and takes up to roughly $12$ Gigabytes memory on each. Once you set up the code, we can use jupyter nbconvert --execute --to notebook --allow-errors --ExecutePreprocessor.timeout=-1 train_gpt_othello.ipynb --inplace --output ckpts/checkpoint.ipynb to run it in background.

Probing Othello-GPT

Then we will use train_probe_othello.py to train probes.
For example, if we want to train a nonlinear probe with hidden size $64$ on internal representations extracted from layer $6$ of the Othello-GPT trained on the championship dataset, we can use the command python train_probe_othello.py --layer 6 --twolayer --mid_dim 64 --championship.
Checkpoints will be saved to ckpts/battery_othello or can be alternatively downloaded from here. What produces the these checkpoints are produce_probes.sh.

Intervening Othello-GPT

See intervening_probe_interact_column.ipynb for the intervention experiment, where we can customize (1) which model to intervene on, (2) the pre-intervention board state (3) which square(s) to intervene.

Attribution via Intervention Plots

See plot_attribution_via_intervention_othello.ipynb for the attribution via intervention experiment, where we can also customize (1) which model to intervene on, (2) the pre-intervention board state (3) which square(s) to attribute.

How to Cite

@inproceedings{
li2023emergent,
title={Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task},
author={Kenneth Li and Aspen K Hopkins and David Bau and Fernanda Vi{\'e}gas and Hanspeter Pfister and Martin Wattenberg},
booktitle={The Eleventh International Conference on Learning Representations },
year={2023},
url={https://openreview.net/forum?id=DeG07_TcZvT}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
data		data
mech_exp		mech_exp
mingpt		mingpt
src		src
togglable		togglable
.gitignore		.gitignore
README.md		README.md
environment.yml		environment.yml
intervening_probe_interact_column.ipynb		intervening_probe_interact_column.ipynb
intervention_benchmark.pkl		intervention_benchmark.pkl
plot_attribution_via_intervention_othello.ipynb		plot_attribution_via_intervention_othello.ipynb
produce_probes.sh		produce_probes.sh
train_gpt_othello.ipynb		train_gpt_othello.ipynb
train_probe_lens.py		train_probe_lens.py
train_probe_othello.py		train_probe_othello.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Othello World

Table of Contents

Installation

Training Othello-GPT

Probing Othello-GPT

Intervening Othello-GPT

Attribution via Intervention Plots

How to Cite

About

Releases

Packages

Languages

ajyl/othello_world

Folders and files

Latest commit

History

Repository files navigation

Othello World

Table of Contents

Installation

Training Othello-GPT

Probing Othello-GPT

Intervening Othello-GPT

Attribution via Intervention Plots

How to Cite

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages