Skip to content

Emergent world representations: Exploring a sequence model trained on a synthetic task

Notifications You must be signed in to change notification settings

ajyl/othello_world

 
 

Repository files navigation

Othello World

This repository provides the code for training, probing and intervening the Othello-GPT in this paper, to be present at ICLR 2023.
The implementation is based on minGPT, thanks to Andrej Karpathy.

Table of Contents

  1. Installation
  2. Training Othello-GPT
  3. Probing Othello-GPT
  4. Intervening Othello-GPT
  5. Attribution via Intervention Plots
  6. How to Cite

Installation

Some plotting functions require Latex on your machine: check this FAQ for how to install.
Then use these commands to set up:

conda env create -f environment.yml
conda activate othello
python -m ipykernel install --user --name othello --display-name "othello"
mkdir -p ckpts/battery_othello

Training Othello-GPT

Download the championship dataset and the synthetic dataset and save them in data subfolder.
Then see train_gpt_othello.ipynb for the training and validation. Alternatively, checkpoints can be downloaded from here to skip this step.
The default experiment setting requires $8$ GPU's and takes up to roughly $12$ Gigabytes memory on each. Once you set up the code, we can use jupyter nbconvert --execute --to notebook --allow-errors --ExecutePreprocessor.timeout=-1 train_gpt_othello.ipynb --inplace --output ckpts/checkpoint.ipynb to run it in background.

Probing Othello-GPT

Then we will use train_probe_othello.py to train probes.
For example, if we want to train a nonlinear probe with hidden size $64$ on internal representations extracted from layer $6$ of the Othello-GPT trained on the championship dataset, we can use the command python train_probe_othello.py --layer 6 --twolayer --mid_dim 64 --championship.
Checkpoints will be saved to ckpts/battery_othello or can be alternatively downloaded from here. What produces the these checkpoints are produce_probes.sh.

Intervening Othello-GPT

See intervening_probe_interact_column.ipynb for the intervention experiment, where we can customize (1) which model to intervene on, (2) the pre-intervention board state (3) which square(s) to intervene.

Attribution via Intervention Plots

See plot_attribution_via_intervention_othello.ipynb for the attribution via intervention experiment, where we can also customize (1) which model to intervene on, (2) the pre-intervention board state (3) which square(s) to attribute.

How to Cite

@inproceedings{
li2023emergent,
title={Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task},
author={Kenneth Li and Aspen K Hopkins and David Bau and Fernanda Vi{\'e}gas and Hanspeter Pfister and Martin Wattenberg},
booktitle={The Eleventh International Conference on Learning Representations },
year={2023},
url={https://openreview.net/forum?id=DeG07_TcZvT}
}

About

Emergent world representations: Exploring a sequence model trained on a synthetic task

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 50.1%
  • Python 32.5%
  • HTML 17.3%
  • Shell 0.1%