Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eval tests #204

Merged
merged 19 commits into from
Apr 25, 2023
Merged

Eval tests #204

merged 19 commits into from
Apr 25, 2023

Conversation

ChristyKoh
Copy link
Collaborator

@ChristyKoh ChristyKoh commented Apr 20, 2023

Addresses #186

@ChristyKoh
Copy link
Collaborator Author

The tests are structured as follows:

  1. setup_elicit creates and runs elk elicit with desired model/dataset
  2. eval_run creates and runs eval, can specify transfer datasets and ccs/vinc
  3. eval_assert_files_created checks:
  • vanilla eval results in modified eval.csv
  • transfer eval creates new eval subdirectory structure and files

Does this sound reasonable? It's currently failing to assert directory creation for transfer evals.

@CLAassistant
Copy link

CLAassistant commented Apr 23, 2023

CLA assistant check
All committers have signed the CLA.

@thejaminator
Copy link
Collaborator

The tests are structured as follows:

  1. setup_elicit creates and runs elk elicit with desired model/dataset
  2. eval_run creates and runs eval, can specify transfer datasets and ccs/vinc
  3. eval_assert_files_created checks:
  • vanilla eval results in modified eval.csv
  • transfer eval creates new eval subdirectory structure and files

Does this sound reasonable? It's currently failing to assert directory creation for transfer evals.

taking a look at it!

elk/files.py Outdated

def transfer_eval_directory(source: str) -> Path:
"""Return the directory where transfer evals are stored."""
return elk_reporter_dir() / source / "transfer_eval"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because tests need to know where it is
(and also we previously mispelled it as transfer_evals (with an s) rather than transfer_eval, haha

"reporters",
"eval.csv",
]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i removed the elicit assertions cos thats handled by the elicit smoke tests

@@ -56,7 +49,7 @@ def check_contains_files(dir: Path, expected_files: list[str]):
assert file in created_file_names


def eval_run(elicit: Elicit, tfr_datasets: list[str] = None) -> int:
def eval_run(elicit: Elicit, transfer_datasets: Sequence[str] = []) -> int:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw its a sequence instead of a list cos list is mutable, so pyright will complain about mutable defaults. but with a protocol of sequence, pyright will complain if you try and mutate it, so that mutable default warning goes awa

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for explaining!!


eval = Eval(data=extract, source=tmp_path)
eval.execute()
return start_time_sec


def eval_assert_files_created(elicit: Elicit, start_time_sec=0):
def eval_assert_files_created(elicit: Elicit, transfer_datasets: Sequence[str] = []):
tmp_path = elicit.out_dir
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eval's execute function, on the latest branch, will only run for the specified transfer datasets so theres no self eval now

@thejaminator thejaminator marked this pull request as ready for review April 23, 2023 19:04
"""Extract hidden states from a model and return a `DatasetDict` containing them."""

def get_splits() -> SplitDict:
available_splits = assert_type(SplitDict, info.splits)
train_name, val_name = select_train_val_splits(available_splits)

pretty_name = colorize(assert_type(str, info.builder_name), highlight_color)
pretty_name = colorize(assert_type(str, ds_name), highlight_color)
print(
Copy link
Collaborator

@thejaminator thejaminator Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously info.builder_name can sometimes be none, e.g. christykoh/imdb has the builder name somehow... set to None. this caused an exception for such cases.

Copy link
Collaborator

@thejaminator thejaminator Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

theres some things i tried, like setting info.builder_name to the path if builder_name doesn't exist

but theres some trippy shizzles going on (as you may have encountered before) with the builder name.

e.g. builder.download_and_prepare would reset the builder_name to the original info, even though we set it to something else in _GeneratorBuilder. This was quite annoying and hard to reason with, so i decided to just put the name in DatasetDictWithName.

in a hopeful future we may also stop using huggingface's dataset as well, so i didn't want to continue messing with it

include_config = config_name and has_multiple_configs(builder_name)
return builder_name + " " + config_name if include_config else builder_name


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also we don't need this anymore since we simply have the name in DatasetDictWithName. yay!

for tfr_dataset in transfer_datasets:
# assert that the dataset column contains the transfer dataset
ds_name, config_name = extract_dataset_name_and_config(tfr_dataset)
assert ds_name in dataset_col.values
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the test case of ["christykoh/imdb_pt", "super_glue boolq"]
results in a single csv under transfer_eval.
It will have the datasets of christykoh/imdb_pt and super_glue under the dataset column.

its a single csv rather than multiple folders, because thats the behavior due to #210. I didn't want to change to much and think that this behavior is ok.

transfer_eval_directory(self.source)
if self.out_dir is None
else self.out_dir
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previously was wrongly specified to be transfer_dir / "+".join(self.data.prompts.datasets).
Also i only set it if it wasn't already specified. Seems like sweep wants to specify their own directory so I didn't want to always override it.

Copy link
Member

@norabelrose norabelrose left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@norabelrose norabelrose merged commit 756781f into main Apr 25, 2023
@norabelrose norabelrose deleted the eval_tests branch April 25, 2023 15:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants