Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sweep Visualizations #245

Merged
merged 25 commits into from
May 19, 2023
Merged
Changes from 1 commit
Commits
Show all changes
25 commits
Select commit Hold shift + click to select a range
37002b1
create heatmap-visualizations for sweeps
lauritofzi May 5, 2023
e552f55
fix viz path + cleanup
lauritofzi May 7, 2023
199c8d5
initial
derpyplops May 14, 2023
1568cd0
refactoring
derpyplops May 14, 2023
181283f
code fix
derpyplops May 14, 2023
2908c0f
fix deps
derpyplops May 14, 2023
6ab281e
fix elk sweep viz flag usage
derpyplops May 15, 2023
ee6e14b
fix typo
derpyplops May 15, 2023
5d526d7
delete comment and factorize
derpyplops May 16, 2023
058cae8
cleanup
lauritowal May 16, 2023
5162cab
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 16, 2023
398ec42
Fix file resolution and factor out sweep_dir()
derpyplops May 16, 2023
9a79c8f
change to relative import
lauritowal May 16, 2023
cf0ad33
Merge branch 'visualizations' of https://github.com/EleutherAI/elk in…
lauritowal May 16, 2023
d6bd7b7
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] May 16, 2023
d4e99b0
Address walt's comments and some tests
derpyplops May 17, 2023
a2e5f60
Change write location to elk-reporters/{sweep}/viz
derpyplops May 18, 2023
68e4b52
Edit README
derpyplops May 18, 2023
0e152bc
Fix TestGetModelPaths
derpyplops May 18, 2023
9d6552f
Fix duplicate bug
derpyplops May 18, 2023
623b2c7
add overwrite flag
derpyplops May 18, 2023
256ad68
add transfer to SweepByDsMultiplot
derpyplops May 18, 2023
9f9c5bb
Remove docstrings for consistency
derpyplops May 18, 2023
033e901
remove vestigial .gitignore
derpyplops May 18, 2023
c176732
remove burns datasets
derpyplops May 18, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Edit README
  • Loading branch information
derpyplops committed May 18, 2023
commit 68e4b52cd94d5121cbf55cdf977828417b75a1b3
71 changes: 55 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,76 +2,115 @@

**WIP: This codebase is under active development**

Because language models are trained to predict the next token in naturally occurring text, they often reproduce common human errors and misconceptions, even when they "know better" in some sense. More worryingly, when models are trained to generate text that's rated highly by humans, they may learn to output false statements that human evaluators can't detect. We aim to circumvent this issue by directly [**eliciting latent knowledge**](https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit) (ELK) inside the activations of a language model.

Specifically, we're building on the **Contrastive Representation Clustering** (CRC) method described in the paper [Discovering Latent Knowledge in Language Models Without Supervision](https://arxiv.org/abs/2212.03827) by Burns et al. (2022). In CRC, we search for features in the hidden states of a language model which satisfy certain logical consistency requirements. It turns out that these features are often useful for question-answering and text classification tasks, even though the features are trained without labels.
Because language models are trained to predict the next token in naturally occurring text, they often reproduce common
human errors and misconceptions, even when they "know better" in some sense. More worryingly, when models are trained to
generate text that's rated highly by humans, they may learn to output false statements that human evaluators can't
detect. We aim to circumvent this issue by directly [**eliciting latent knowledge
**](https://docs.google.com/document/d/1WwsnJQstPq91_Yh-Ch2XRL8H_EpsnjrC1dwZXR37PC8/edit) (ELK) inside the activations
of a language model.

Specifically, we're building on the **Contrastive Representation Clustering** (CRC) method described in the
paper [Discovering Latent Knowledge in Language Models Without Supervision](https://arxiv.org/abs/2212.03827) by Burns
et al. (2022). In CRC, we search for features in the hidden states of a language model which satisfy certain logical
consistency requirements. It turns out that these features are often useful for question-answering and text
classification tasks, even though the features are trained without labels.

### Quick **Start**

Our code is based on [PyTorch](http:https://pytorch.org) and [Huggingface Transformers](https://huggingface.co/docs/transformers/index). We test the code on Python 3.10 and 3.11.
Our code is based on [PyTorch](http:https://pytorch.org)
and [Huggingface Transformers](https://huggingface.co/docs/transformers/index). We test the code on Python 3.10 and
3.11.

First install the package with `pip install -e .` in the root directory, or `pip install -e .[dev]` if you'd like to contribute to the project (see **Development** section below). This should install all the necessary dependencies.
First install the package with `pip install -e .` in the root directory, or `pip install -e .[dev]` if you'd like to
contribute to the project (see **Development** section below). This should install all the necessary dependencies.

To fit reporters for the HuggingFace model `model` and dataset `dataset`, just run:

```bash
elk elicit microsoft/deberta-v2-xxlarge-mnli imdb
```

This will automatically download the model and dataset, run the model and extract the relevant representations if they aren't cached on disk, fit reporters on them, and save the reporter checkpoints to the `elk-reporters` folder in your home directory. It will also evaluate the reporter classification performance on a held out test set and save it to a CSV file in the same folder.
This will automatically download the model and dataset, run the model and extract the relevant representations if they
aren't cached on disk, fit reporters on them, and save the reporter checkpoints to the `elk-reporters` folder in your
home directory. It will also evaluate the reporter classification performance on a held out test set and save it to a
CSV file in the same folder.

The following will generate a CCS (Contrast Consistent Search) reporter instead of the CRC-based reporter, which is the default.
The following will generate a CCS (Contrast Consistent Search) reporter instead of the CRC-based reporter, which is the
default.

```bash
elk elicit microsoft/deberta-v2-xxlarge-mnli imdb --net ccs
```

The following command will evaluate the probe from the run naughty-northcutt on the hidden states extracted from the model deberta-v2-xxlarge-mnli for the imdb dataset. It will result in an `eval.csv` and `cfg.yaml` file, which are stored under a subfolder in `elk-reporters/naughty-northcutt/transfer_eval`.
The following command will evaluate the probe from the run naughty-northcutt on the hidden states extracted from the
model deberta-v2-xxlarge-mnli for the imdb dataset. It will result in an `eval.csv` and `cfg.yaml` file, which are
stored under a subfolder in `elk-reporters/naughty-northcutt/transfer_eval`.

```bash
elk eval naughty-northcutt microsoft/deberta-v2-xxlarge-mnli imdb
```

The following runs `elicit` on the Cartesian product of the listed models and datasets, storing it in a special folder ELK_DIR/sweeps/<memorable_name>. Moreover, `--add_pooled` adds an additional dataset that pools all of the datasets together.
The following runs `elicit` on the Cartesian product of the listed models and datasets, storing it in a special folder
ELK_DIR/sweeps/<memorable_name>. Moreover, `--add_pooled` adds an additional dataset that pools all of the datasets
together. You can also add a `--visualize` flag to visualize the results of the sweep.

```bash
elk sweep --models gpt2-{medium,large,xl} --datasets imdb amazon_polarity --add_pooled
```

If you just do `elk plot`, it will plot the results from the most recent sweep.
If you want to plot a specific sweep, you can do so with:

```bash
elk plot {sweep_name}
```

## Caching

The hidden states resulting from `elk elicit` are cached as a HuggingFace dataset to avoid having to recompute them every time we want to train a probe. The cache is stored in the same place as all other HuggingFace datasets, which is usually `~/.cache/huggingface/datasets`.
The hidden states resulting from `elk elicit` are cached as a HuggingFace dataset to avoid having to recompute them
every time we want to train a probe. The cache is stored in the same place as all other HuggingFace datasets, which is
usually `~/.cache/huggingface/datasets`.

## Development

Use `pip install pre-commit && pre-commit install` in the root folder before your first commit.

### Devcontainer

[
![Open in Remote - Containers](
https://img.shields.io/static/v1?label=Remote%20-%20Containers&message=Open&color=blue&logo=visualstudiocode
)
![Open in Remote - Containers](
https://img.shields.io/static/v1?label=Remote%20-%20Containers&message=Open&color=blue&logo=visualstudiocode
)
](
https://vscode.dev/redirect?url=vscode:https://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/EleutherAI/elk
https://vscode.dev/redirect?url=vscode:https://ms-vscode-remote.remote-containers/cloneInVolume?url=https://github.com/EleutherAI/elk
)

### Run tests

```bash
pytest
```

### Run type checking
We use [pyright](https://github.com/microsoft/pyright), which is built into the VSCode editor. If you'd like to run it as a standalone tool, it requires a [nodejs installation.](https://nodejs.org/en/download/)

We use [pyright](https://github.com/microsoft/pyright), which is built into the VSCode editor. If you'd like to run it
as a standalone tool, it requires a [nodejs installation.](https://nodejs.org/en/download/)

```bash
pyright
```

### Run the linter

We use [ruff](https://beta.ruff.rs/docs/). It is installed as a pre-commit hook, so you don't have to run it manually.
If you want to run it manually, you can do so with:

```bash
ruff . --fix
```

### Contributing to this repository

If you work on a new feature / fix or some other code task, make sure to create an issue and assign it to yourself (Maybe, even share it in the elk channel of Eleuther's Discord with a small note). In this way, others know you are working on the issue and people won't do the same thing twice 👍 Also others can contact you easily.
If you work on a new feature / fix or some other code task, make sure to create an issue and assign it to yourself (
Maybe, even share it in the elk channel of Eleuther's Discord with a small note). In this way, others know you are
working on the issue and people won't do the same thing twice 👍 Also others can contact you easily.
Loading