Multiple datasets refactor #189

norabelrose · 2023-04-13T20:06:24Z

This PR refactors various parts of the code in order to handle the training of reporters on multiple datasets simultaneously. We kinda-sorta "supported" this before but not correctly.

Before, we tried to merge together the hidden states from all the datasets into a single HF dataset. This creates a lot of problems; for one thing, it means that the hiddens aren't cached properly. For example, if you extract hiddens for imdb and then you want to fit a reporter on both imdb and amazon_polarity, it wouldn't use the cached imdb hiddens. Now we do use the cache in cases like this. To accomplish this, I needed to change how reporters are trained and evaluated. The training code now passes around dictionaries where the keys are dataset names and the values are tuples of hiddens and labels. This part could probably be further cleaned up a bit but it's a decent MVP I think.

Now, the eval.csv will separately report the metrics of the reporter on each dataset instead of reporting pooled metrics. I take it that this is much more useful than the pooled metrics. But maybe we should include the pooled metrics as well, idk.

There's also a few little fixes/enhancements in here, most notably that I switched to using torch.linalg.eigh by default instead of truncated_eigh for the time being. This makes me sad, since I put a lot of work into truncated_eigh, but it fails to converge way too often and ends up printing annoying warnings when this happens. Hopefully we'll be able to revive it eventually.

This PR also fixes #96, in a different way from PR #170. Basically I needed normalization to be a property of Reporters in order to get multiple datasets to work in a reasonable way. Essentially I don't do any explicit normalization for VINC, since it doesn't actually need it (we already de-mean representations from different classes separately), whereas for CCS I use a novel Normalizer class which is probably slightly over-engineered but whatever.

…rent

AlexTMallen

Looks good!

Things blocking approval:

Logging needs to be updated to handle multiple datasets
Training is very slow (is this just because we're not truncating the eigendecomposition?)
--skip_baseline flag isn't used in trianing
(Lower priority) normalization of reporter isn't used for pseudo_auroc

elk/extraction/prompt_loading.py

AlexTMallen · 2023-04-13T21:20:43Z

elk/extraction/extraction.py

- print(f"Using '{train_name}' for training and '{val_name}' for validation")
-
+ print(
+ # Cyan color for dataset name


this is a nice touch :)

elk/training/train.py

AlexTMallen · 2023-04-13T22:00:22Z

elk/training/train.py


 if isinstance(self.cfg.net, CcsReporterConfig):
- reporter = CcsReporter(x0.shape[-1], self.cfg.net, device=device)
+ assert len(train_dict) == 1, "CCS only supports single-task training"


While this isn't high priority, we could be a little more lenient than this and allow training on mixtures of datasets when the shapes of the tensors are the same (same num_variants, num_choices).

elk/training/train.py

…ds-eval

AlexTMallen

LGTM! (I also made a small commit just before reviewing, someone should quickly check it)

lauritowal

Getting the following error when running eval right now:

(.venv) laurito@ipe-bison:~/elk$ elk eval /home/laurito/elk-reporters/gpt2/imdb\,\ super_glue\ boolq/gracious-mendel/ gpt2 imdb ag_news --num_gpus 1 --max_examples 100
Using 1 of 2 GPUs: [1]
imdb: using 'train' for training and 'test' for validation
Found cached dataset generator (/home/laurito/.cache/huggingface/datasets/generator/default-4e6ce9115ec3adc4/0.0.0)
Found cached dataset generator (/home/laurito/.cache/huggingface/datasets/generator/default-58da717a77241cc1/0.0.0)
Using 1 of 2 GPUs: [1]
ag_news: using 'train' for training and 'test' for validation
Found cached dataset generator (/home/laurito/.cache/huggingface/datasets/generator/default-0d15ca1ac6cd44d9/0.0.0)
Found cached dataset generator (/home/laurito/.cache/huggingface/datasets/generator/default-31f2909bb9ce8ebf/0.0.0)
Output directory at /home/laurito/elk-reporters/gpt2/imdb, super_glue boolq/gracious-mendel/transfer_eval/imdb
Using 1 of 2 GPUs: [1]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:01<00:00,  8.46it/s]
Traceback (most recent call last):
  File "/home/laurito/elk/.venv/bin/elk", line 8, in <module>
    sys.exit(run())
             ^^^^^
  File "/home/laurito/elk/elk/__main__.py", line 26, in run
    run.execute()
  File "/home/laurito/elk/elk/__main__.py", line 18, in execute
    return self.command.execute()
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/laurito/elk/elk/evaluation/evaluate.py", line 53, in execute
    run.evaluate()
  File "/home/laurito/elk/elk/evaluation/evaluate.py", line 111, in evaluate
    self.apply_to_layers(func=func, num_devices=num_devices)
  File "/home/laurito/elk/elk/run.py", line 151, in apply_to_layers
    df = pd.concat(df_buf).sort_values(by="layer")
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/laurito/elk/.venv/lib/python3.11/site-packages/pandas/core/frame.py", line 6766, in sort_values
    k = self._get_label_or_level_values(by, axis=axis)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/laurito/elk/.venv/lib/python3.11/site-packages/pandas/core/generic.py", line 1778, in _get_label_or_level_values
    raise KeyError(key)
KeyError: 'layer'

Didn't look into it in more detail, yet.

norabelrose · 2023-04-14T23:35:01Z

Getting the following error when running eval right now:

(.venv) laurito@ipe-bison:~/elk$ elk eval /home/laurito/elk-reporters/gpt2/imdb\,\ super_glue\ boolq/gracious-mendel/ gpt2 imdb ag_news --num_gpus 1 --max_examples 100
Using 1 of 2 GPUs: [1]
imdb: using 'train' for training and 'test' for validation
Found cached dataset generator (/home/laurito/.cache/huggingface/datasets/generator/default-4e6ce9115ec3adc4/0.0.0)
Found cached dataset generator (/home/laurito/.cache/huggingface/datasets/generator/default-58da717a77241cc1/0.0.0)
Using 1 of 2 GPUs: [1]
ag_news: using 'train' for training and 'test' for validation
Found cached dataset generator (/home/laurito/.cache/huggingface/datasets/generator/default-0d15ca1ac6cd44d9/0.0.0)
Found cached dataset generator (/home/laurito/.cache/huggingface/datasets/generator/default-31f2909bb9ce8ebf/0.0.0)
Output directory at /home/laurito/elk-reporters/gpt2/imdb, super_glue boolq/gracious-mendel/transfer_eval/imdb
Using 1 of 2 GPUs: [1]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 12/12 [00:01<00:00,  8.46it/s]
Traceback (most recent call last):
  File "/home/laurito/elk/.venv/bin/elk", line 8, in <module>
    sys.exit(run())
             ^^^^^
  File "/home/laurito/elk/elk/__main__.py", line 26, in run
    run.execute()
  File "/home/laurito/elk/elk/__main__.py", line 18, in execute
    return self.command.execute()
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/laurito/elk/elk/evaluation/evaluate.py", line 53, in execute
    run.evaluate()
  File "/home/laurito/elk/elk/evaluation/evaluate.py", line 111, in evaluate
    self.apply_to_layers(func=func, num_devices=num_devices)
  File "/home/laurito/elk/elk/run.py", line 151, in apply_to_layers
    df = pd.concat(df_buf).sort_values(by="layer")
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/laurito/elk/.venv/lib/python3.11/site-packages/pandas/core/frame.py", line 6766, in sort_values
    k = self._get_label_or_level_values(by, axis=axis)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/laurito/elk/.venv/lib/python3.11/site-packages/pandas/core/generic.py", line 1778, in _get_label_or_level_values
    raise KeyError(key)
KeyError: 'layer'

Didn't look into it in more detail, yet.

Fixed

norabelrose added 5 commits April 12, 2023 05:44

Fix bug where cached hidden states aren’t used when num_gpus is diffe…

97b26ac

…rent

Actually works now

11fda87

Refactor handling of multiple datasets

da4c72f

Various fixes

e1675f7

Merge remote-tracking branch 'origin/main' into multi-ds-eval

8cc325b

norabelrose requested review from lauritowal and AlexTMallen April 13, 2023 20:06

norabelrose added 4 commits April 13, 2023 20:08

Fix math tests

14987e1

Fix smoke tests

88683fa

All tests working ostensibly

a6c382e

Make CCS normalization customizable

ecc53cb

AlexTMallen requested changes Apr 13, 2023

View reviewed changes

AlexTMallen and others added 5 commits April 13, 2023 18:59

log each dataset individually

18c7f4c

Move pseudo AUROC stuff to CcsReporter

1e9ce06

Make 'datasets' and 'label_columns' config options more opinionated

35a8f34

tiny spacing change

615bbb1

Allow for toggling CV

f021404

norabelrose requested a review from AlexTMallen April 14, 2023 06:13

AlexTMallen added 2 commits April 14, 2023 12:50

add typing to logging; rename logging

c04b705

Merge branch 'multi-ds-eval' of github.com:EleutherAI/elk into multi-…

3ec33bc

…ds-eval

AlexTMallen approved these changes Apr 14, 2023

View reviewed changes

lauritowal reviewed Apr 14, 2023

View reviewed changes

Fix eval logging bug

0b5fbd3

norabelrose merged commit 16dc1ca into main Apr 14, 2023

norabelrose deleted the multi-ds-eval branch April 14, 2023 23:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multiple datasets refactor #189

Multiple datasets refactor #189

norabelrose commented Apr 13, 2023 •

edited

Loading

AlexTMallen left a comment

AlexTMallen Apr 13, 2023

AlexTMallen Apr 13, 2023

AlexTMallen left a comment

lauritowal left a comment

norabelrose commented Apr 14, 2023

Multiple datasets refactor #189

Multiple datasets refactor #189

Conversation

norabelrose commented Apr 13, 2023 • edited Loading

AlexTMallen left a comment

Choose a reason for hiding this comment

AlexTMallen Apr 13, 2023

Choose a reason for hiding this comment

AlexTMallen Apr 13, 2023

Choose a reason for hiding this comment

AlexTMallen left a comment

Choose a reason for hiding this comment

lauritowal left a comment

Choose a reason for hiding this comment

norabelrose commented Apr 14, 2023

norabelrose commented Apr 13, 2023 •

edited

Loading