Eval tests #204

ChristyKoh · 2023-04-20T07:40:33Z

Addresses #186

for more information, see https://pre-commit.ci

tests/test_smoke_eval.py

for more information, see https://pre-commit.ci

ChristyKoh · 2023-04-20T18:45:50Z

The tests are structured as follows:

setup_elicit creates and runs elk elicit with desired model/dataset
eval_run creates and runs eval, can specify transfer datasets and ccs/vinc
eval_assert_files_created checks:

vanilla eval results in modified eval.csv
transfer eval creates new eval subdirectory structure and files

Does this sound reasonable? It's currently failing to assert directory creation for transfer evals.

for more information, see https://pre-commit.ci

CLAassistant · 2023-04-23T02:55:12Z

All committers have signed the CLA.

thejaminator · 2023-04-23T09:06:05Z

The tests are structured as follows:

setup_elicit creates and runs elk elicit with desired model/dataset

eval_run creates and runs eval, can specify transfer datasets and ccs/vinc

eval_assert_files_created checks:

vanilla eval results in modified eval.csv

transfer eval creates new eval subdirectory structure and files

Does this sound reasonable? It's currently failing to assert directory creation for transfer evals.

taking a look at it!

thejaminator · 2023-04-23T19:00:58Z

elk/files.py

+
+def transfer_eval_directory(source: str) -> Path:
+ """Return the directory where transfer evals are stored."""
+ return elk_reporter_dir() / source / "transfer_eval"


because tests need to know where it is
(and also we previously mispelled it as transfer_evals (with an s) rather than transfer_eval, haha

thejaminator · 2023-04-23T19:01:28Z

tests/test_smoke_eval.py

- "reporters",
- "eval.csv",
-]
-


i removed the elicit assertions cos thats handled by the elicit smoke tests

thejaminator · 2023-04-23T19:02:24Z

tests/test_smoke_eval.py

@@ -56,7 +49,7 @@ def check_contains_files(dir: Path, expected_files: list[str]):
 assert file in created_file_names


-def eval_run(elicit: Elicit, tfr_datasets: list[str] = None) -> int:
+def eval_run(elicit: Elicit, transfer_datasets: Sequence[str] = []) -> int:


btw its a sequence instead of a list cos list is mutable, so pyright will complain about mutable defaults. but with a protocol of sequence, pyright will complain if you try and mutate it, so that mutable default warning goes awa

I see, thanks for explaining!!

thejaminator · 2023-04-23T19:03:01Z

tests/test_smoke_eval.py


 eval = Eval(data=extract, source=tmp_path)
 eval.execute()
 return start_time_sec


-def eval_assert_files_created(elicit: Elicit, start_time_sec=0):
+def eval_assert_files_created(elicit: Elicit, transfer_datasets: Sequence[str] = []):
 tmp_path = elicit.out_dir


eval's execute function, on the latest branch, will only run for the specified transfer datasets so theres no self eval now

# Conflicts: # elk/evaluation/evaluate.py # elk/extraction/extraction.py # elk/run.py # elk/training/train.py # elk/utils/__init__.py

for more information, see https://pre-commit.ci

thejaminator · 2023-04-25T13:31:28Z

elk/extraction/extraction.py

 """Extract hidden states from a model and return a `DatasetDict` containing them."""

 def get_splits() -> SplitDict:
 available_splits = assert_type(SplitDict, info.splits)
 train_name, val_name = select_train_val_splits(available_splits)

- pretty_name = colorize(assert_type(str, info.builder_name), highlight_color)
+ pretty_name = colorize(assert_type(str, ds_name), highlight_color)
 print(


previously info.builder_name can sometimes be none, e.g. christykoh/imdb has the builder name somehow... set to None. this caused an exception for such cases.

theres some things i tried, like setting info.builder_name to the path if builder_name doesn't exist

but theres some trippy shizzles going on (as you may have encountered before) with the builder name.

e.g. builder.download_and_prepare would reset the builder_name to the original info, even though we set it to something else in _GeneratorBuilder. This was quite annoying and hard to reason with, so i decided to just put the name in DatasetDictWithName.

in a hopeful future we may also stop using huggingface's dataset as well, so i didn't want to continue messing with it

thejaminator · 2023-04-25T13:32:00Z

elk/utils/data_utils.py

- include_config = config_name and has_multiple_configs(builder_name)
- return builder_name + " " + config_name if include_config else builder_name
-
-


also we don't need this anymore since we simply have the name in DatasetDictWithName. yay!

thejaminator · 2023-04-25T13:36:51Z

tests/test_smoke_eval.py

+ for tfr_dataset in transfer_datasets:
+ # assert that the dataset column contains the transfer dataset
+ ds_name, config_name = extract_dataset_name_and_config(tfr_dataset)
+ assert ds_name in dataset_col.values


so the test case of ["christykoh/imdb_pt", "super_glue boolq"]
results in a single csv under transfer_eval.
It will have the datasets of christykoh/imdb_pt and super_glue under the dataset column.

its a single csv rather than multiple folders, because thats the behavior due to #210. I didn't want to change to much and think that this behavior is ok.

thejaminator · 2023-04-25T13:38:56Z

elk/evaluation/evaluate.py

+ transfer_eval_directory(self.source)
+ if self.out_dir is None
+ else self.out_dir
+ )


previously was wrongly specified to be transfer_dir / "+".join(self.data.prompts.datasets).
Also i only set it if it wasn't already specified. Seems like sweep wants to specify their own directory so I didn't want to always override it.

norabelrose

LGTM

ChristyKoh and others added 2 commits April 20, 2023 07:39

draft eval smoke tests

7c2a5e4

[pre-commit.ci] auto fixes from pre-commit.com hooks

3947bba

for more information, see https://pre-commit.ci

thejaminator reviewed Apr 20, 2023

View reviewed changes

tests/test_smoke_eval.py Outdated Show resolved Hide resolved

ChristyKoh and others added 6 commits April 20, 2023 15:02

rm extraneous utils

3f64a6e

Merge branch 'eval_tests' of github.com:EleutherAI/elk into eval_tests

59ca19e

simplify eig smoke test

7287757

Merge branch 'main' into eval_tests

53088cc

fix prompt update bug, failing tests

7ba78f1

[pre-commit.ci] auto fixes from pre-commit.com hooks

453370f

for more information, see https://pre-commit.ci

ChristyKoh and others added 4 commits April 20, 2023 18:46

split eval_run into act and assert

2fb36c6

[pre-commit.ci] auto fixes from pre-commit.com hooks

d4eb57e

for more information, see https://pre-commit.ci

prompt loading return

396753c

[pre-commit.ci] auto fixes from pre-commit.com hooks

f8fb574

for more information, see https://pre-commit.ci

thejaminator reviewed Apr 23, 2023

View reviewed changes

thejaminator marked this pull request as ready for review April 23, 2023 19:04

Repush commits under [email protected]

e291ab4

thejaminator force-pushed the eval_tests branch from d3e9ae4 to e291ab4 Compare April 25, 2023 05:11

thejaminator and others added 6 commits April 25, 2023 13:16

Merge remote-tracking branch 'origin/main' into eval_tests

431fefc

# Conflicts: # elk/evaluation/evaluate.py # elk/extraction/extraction.py # elk/run.py # elk/training/train.py # elk/utils/__init__.py

fix the dataset name

5b6c828

fix other datasetdict functions

408b8d3

[pre-commit.ci] auto fixes from pre-commit.com hooks

68f41be

for more information, see https://pre-commit.ci

remove unused class

73640a5

[pre-commit.ci] auto fixes from pre-commit.com hooks

625d9e8

for more information, see https://pre-commit.ci

thejaminator reviewed Apr 25, 2023

View reviewed changes

norabelrose approved these changes Apr 25, 2023

View reviewed changes

norabelrose merged commit 756781f into main Apr 25, 2023

norabelrose deleted the eval_tests branch April 25, 2023 15:59

derpyplops mentioned this pull request May 22, 2023

Smoke tests for elk eval command #186

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval tests #204

Eval tests #204

ChristyKoh commented Apr 20, 2023 •

edited

Loading

ChristyKoh commented Apr 20, 2023

CLAassistant commented Apr 23, 2023 •

edited

Loading

thejaminator commented Apr 23, 2023

thejaminator Apr 23, 2023

thejaminator Apr 23, 2023

thejaminator Apr 23, 2023

ChristyKoh Apr 24, 2023

thejaminator Apr 23, 2023

thejaminator Apr 25, 2023 •

edited

Loading

thejaminator Apr 25, 2023 •

edited

Loading

thejaminator Apr 25, 2023

thejaminator Apr 25, 2023

thejaminator Apr 25, 2023

norabelrose left a comment

		include_config = config_name and has_multiple_configs(builder_name)
		return builder_name + " " + config_name if include_config else builder_name

Eval tests #204

Eval tests #204

Conversation

ChristyKoh commented Apr 20, 2023 • edited Loading

ChristyKoh commented Apr 20, 2023

CLAassistant commented Apr 23, 2023 • edited Loading

thejaminator commented Apr 23, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thejaminator Apr 25, 2023 • edited Loading

Choose a reason for hiding this comment

thejaminator Apr 25, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

norabelrose left a comment

Choose a reason for hiding this comment

ChristyKoh commented Apr 20, 2023 •

edited

Loading

CLAassistant commented Apr 23, 2023 •

edited

Loading

thejaminator Apr 25, 2023 •

edited

Loading

thejaminator Apr 25, 2023 •

edited

Loading