-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Eval tests #204
Eval tests #204
Conversation
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
The tests are structured as follows:
Does this sound reasonable? It's currently failing to assert directory creation for transfer evals. |
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
taking a look at it! |
elk/files.py
Outdated
|
||
def transfer_eval_directory(source: str) -> Path: | ||
"""Return the directory where transfer evals are stored.""" | ||
return elk_reporter_dir() / source / "transfer_eval" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because tests need to know where it is
(and also we previously mispelled it as transfer_evals
(with an s) rather than transfer_eval
, haha
tests/test_smoke_eval.py
Outdated
"reporters", | ||
"eval.csv", | ||
] | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i removed the elicit assertions cos thats handled by the elicit smoke tests
@@ -56,7 +49,7 @@ def check_contains_files(dir: Path, expected_files: list[str]): | |||
assert file in created_file_names | |||
|
|||
|
|||
def eval_run(elicit: Elicit, tfr_datasets: list[str] = None) -> int: | |||
def eval_run(elicit: Elicit, transfer_datasets: Sequence[str] = []) -> int: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw its a sequence instead of a list cos list is mutable, so pyright will complain about mutable defaults. but with a protocol of sequence, pyright will complain if you try and mutate it, so that mutable default warning goes awa
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thanks for explaining!!
|
||
eval = Eval(data=extract, source=tmp_path) | ||
eval.execute() | ||
return start_time_sec | ||
|
||
|
||
def eval_assert_files_created(elicit: Elicit, start_time_sec=0): | ||
def eval_assert_files_created(elicit: Elicit, transfer_datasets: Sequence[str] = []): | ||
tmp_path = elicit.out_dir |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
eval's execute function, on the latest branch, will only run for the specified transfer datasets so theres no self eval now
d3e9ae4
to
e291ab4
Compare
# Conflicts: # elk/evaluation/evaluate.py # elk/extraction/extraction.py # elk/run.py # elk/training/train.py # elk/utils/__init__.py
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
"""Extract hidden states from a model and return a `DatasetDict` containing them.""" | ||
|
||
def get_splits() -> SplitDict: | ||
available_splits = assert_type(SplitDict, info.splits) | ||
train_name, val_name = select_train_val_splits(available_splits) | ||
|
||
pretty_name = colorize(assert_type(str, info.builder_name), highlight_color) | ||
pretty_name = colorize(assert_type(str, ds_name), highlight_color) | ||
print( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously info.builder_name
can sometimes be none, e.g. christykoh/imdb
has the builder name somehow... set to None. this caused an exception for such cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
theres some things i tried, like setting info.builder_name to the path if builder_name doesn't exist
but theres some trippy shizzles going on (as you may have encountered before) with the builder name.
e.g. builder.download_and_prepare
would reset the builder_name to the original info, even though we set it to something else in _GeneratorBuilder
. This was quite annoying and hard to reason with, so i decided to just put the name in DatasetDictWithName
.
in a hopeful future we may also stop using huggingface's dataset as well, so i didn't want to continue messing with it
include_config = config_name and has_multiple_configs(builder_name) | ||
return builder_name + " " + config_name if include_config else builder_name | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also we don't need this anymore since we simply have the name in DatasetDictWithName. yay!
for tfr_dataset in transfer_datasets: | ||
# assert that the dataset column contains the transfer dataset | ||
ds_name, config_name = extract_dataset_name_and_config(tfr_dataset) | ||
assert ds_name in dataset_col.values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the test case of ["christykoh/imdb_pt", "super_glue boolq"]
results in a single csv under transfer_eval
.
It will have the datasets of christykoh/imdb_pt
and super_glue
under the dataset column.
its a single csv rather than multiple folders, because thats the behavior due to #210. I didn't want to change to much and think that this behavior is ok.
transfer_eval_directory(self.source) | ||
if self.out_dir is None | ||
else self.out_dir | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
previously was wrongly specified to be transfer_dir / "+".join(self.data.prompts.datasets)
.
Also i only set it if it wasn't already specified. Seems like sweep wants to specify their own directory so I didn't want to always override it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Addresses #186