Allow prompt templates to say if they should be binarized #234

norabelrose · 2023-05-02T09:21:47Z

Adds a binarize field to prompt template YAML files. This is sorta needed for sweeps in which some datasets need to be binarized and others do not.

lauritowal · 2023-05-02T15:04:32Z

Getting errors right now:

(.venv) laurito@ipe-monster:~/elk$ elk sweep --models gpt2 --datasets ag_news --max_examples 10 10 --num_gpus 1
Starting sweep over 1 models and 1 datasets (1 runs)
Models: ['gpt2']
Datasets: ['ag_news']
Saving sweep results to /home/wombat_share/laurito/elk_reporters/sweeps/stoic-merkle
===== gpt2 (1 of 1) =====
Using 1 of 7 GPUs: [1]
ag_news using 'train' for training and 'test' for validation
Found cached dataset generator (/home/wombat_share/laurito/.hugginface/datasets/generator/default-b9aa46e9e0062f0d/0.0.0)
Found cached dataset generator (/home/wombat_share/laurito/.hugginface/datasets/generator/default-3941ce3946a68d45/0.0.0)
Output directory at /home/wombat_share/laurito/elk_reporters/sweeps/stoic-merkle/gpt2/ag_news
  0%|                                                                                                                                                                        | 0/13 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "/home/laurito/elk/.venv/bin/elk", line 8, in <module>
    sys.exit(run())
  File "/home/laurito/elk/elk/__main__.py", line 27, in run
    run.execute()
  File "/home/laurito/elk/elk/__main__.py", line 19, in execute
    return self.command.execute()
  File "/home/laurito/elk/elk/training/sweep.py", line 81, in execute
    run.execute()
  File "/home/laurito/elk/elk/run.py", line 98, in execute
    self.apply_to_layers(func=func, num_devices=num_devices)
  File "/home/laurito/elk/elk/run.py", line 182, in apply_to_layers
    for df_dict in tqdm(mapper(func, layers), total=len(layers)):
  File "/home/laurito/elk/.venv/lib/python3.10/site-packages/tqdm/std.py", line 1178, in __iter__
    for obj in iterable:
  File "/home/laurito/elk/elk/training/train.py", line 57, in apply_to_layer
    train_dict = self.prepare_data(device, layer, "train")
  File "/home/laurito/elk/elk/run.py", line 138, in prepare_data
    val_h = int16_to_float32(assert_type(Tensor, split[f"hidden_{layer}"]))
  File "/home/laurito/elk/.venv/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2779, in __getitem__
    return self._getitem(key)
  File "/home/laurito/elk/.venv/lib/python3.10/site-packages/datasets/arrow_dataset.py", line 2764, in _getitem
    formatted_output = format_table(
  File "/home/laurito/elk/.venv/lib/python3.10/site-packages/datasets/formatting/formatting.py", line 624, in format_table
    return formatter(pa_table, query_type=query_type)
  File "/home/laurito/elk/.venv/lib/python3.10/site-packages/datasets/formatting/formatting.py", line 398, in __call__
    return self.format_column(pa_table)
  File "/home/laurito/elk/.venv/lib/python3.10/site-packages/datasets/formatting/torch_formatter.py", line 86, in format_column
    column = self.numpy_arrow_extractor().extract_column(pa_table)
  File "/home/laurito/elk/.venv/lib/python3.10/site-packages/datasets/formatting/formatting.py", line 161, in extract_column
    return self._arrow_array_to_numpy(pa_table[pa_table.column_names[0]])
  File "/home/laurito/elk/.venv/lib/python3.10/site-packages/datasets/formatting/formatting.py", line 171, in _arrow_array_to_numpy
    array: List = [
  File "/home/laurito/elk/.venv/lib/python3.10/site-packages/datasets/formatting/formatting.py", line 172, in <listcomp>
    row for chunk in pa_array.chunks for row in chunk.to_numpy(zero_copy_only=zero_copy_only)
  File "/home/laurito/elk/.venv/lib/python3.10/site-packages/datasets/features/features.py", line 726, in to_numpy
    numpy_arr = numpy_arr.reshape(len(self) - len(null_indices), *self.type.shape)
ValueError: cannot reshape array of size 230400 into shape (10,15,4,768)

Having a look at it too now

lauritowal

Getting the above error right now, I can have a look into it...

lauritowal

Nevermind, running I think using the disabled cache flag made it work now.

lauritowal

okaay no still getting the same error

AlexTMallen

LGTM!

…#234) * Allow prompt templates to say if they should be binarized * Fix dataset features bug

Allow prompt templates to say if they should be binarized

4cde92e

norabelrose requested review from lauritowal and AlexTMallen May 2, 2023 09:21

lauritowal requested changes May 2, 2023

View reviewed changes

lauritowal approved these changes May 2, 2023

View reviewed changes

lauritowal requested changes May 2, 2023

View reviewed changes

Fix dataset features bug

a160090

AlexTMallen approved these changes May 2, 2023

View reviewed changes

lauritowal approved these changes May 2, 2023

View reviewed changes

norabelrose merged commit 46cf123 into main May 2, 2023

norabelrose deleted the binarize-yaml branch May 2, 2023 19:42

adzcai pushed a commit to adzcai/elk that referenced this pull request May 4, 2023

Allow prompt templates to say if they should be binarized (EleutherAI…

f813643

…#234) * Allow prompt templates to say if they should be binarized * Fix dataset features bug

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow prompt templates to say if they should be binarized #234

Allow prompt templates to say if they should be binarized #234

norabelrose commented May 2, 2023

lauritowal commented May 2, 2023

lauritowal left a comment •

edited

Loading

lauritowal left a comment •

edited

Loading

lauritowal left a comment

AlexTMallen left a comment

Allow prompt templates to say if they should be binarized #234

Allow prompt templates to say if they should be binarized #234

Conversation

norabelrose commented May 2, 2023

lauritowal commented May 2, 2023

lauritowal left a comment • edited Loading

Choose a reason for hiding this comment

lauritowal left a comment • edited Loading

Choose a reason for hiding this comment

lauritowal left a comment

Choose a reason for hiding this comment

AlexTMallen left a comment

Choose a reason for hiding this comment

lauritowal left a comment •

edited

Loading

lauritowal left a comment •

edited

Loading