Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

implement test cases for reproducing the results #19

Closed
wants to merge 0 commits into from

Conversation

mariopenglee
Copy link
Contributor

only dataframe comparison for now, need to add evaluations

Comment on lines 41 to 42
assert row["accuracy"] <= current_performance.iloc[index]["accuracy"]
assert row["std"] <= current_performance.iloc[index]["std"]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we should maybe add some +/- epsilon (e.g having value 3) to increase the tolerance a bit (?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah probably. We should make sure the random seed is fixed and then just see empirically how big the discrepancy is with the current code

@lauritowal lauritowal changed the title implement test case for confusion implement test cases for reproducing the results Feb 3, 2023
models_layer_num = default_config["models_layer_num"]


def train_and_evaluate(model_name, dataset):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to be able to train on one dataset and evaluate either on the same or another dataset.

train_and_evaluate(model_name, dataset)

# Read the contents of the csv files into pandas dataframes
benchmark_performance = pd.read_csv(f"../test_data/{model_name}_confusion_0.csv")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be able to reproduce the results of the paper, we want mostly the prefix "normal" not "confusion" --> See parser file. However, it might make sense to have an option to be able to select between the different prefixes for the test


# Compare the contents of the dataframes,
# make sure current performance is within threshold (epsilon) or better
epsilon = 3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would be better if we pass epsilon, instead of hard-coding it here

Train and evaluate the model with the given dataset.
"""
training_command = "python -m elk.train "
f"--model {model_name} --prefix normal --dataset {dataset} --num_data 1000 --seed 0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it is better, if we call the functions from train.py and evaluate.py directly?

train_and_evaluate(model_name, dataset)

# Read the contents of the csv files into pandas dataframes
benchmark_performance = pd.read_csv(f"../test_data/{model_name}_confusion_0.csv")
Copy link
Collaborator

@lauritowal lauritowal Feb 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to create this csv file (probably manually) and add it to the repo, based on the format / structure of the current .csv there... It should contain all the different combinations from the table in the paper (for logistic regression and CCS only): https://ar5iv.labs.arxiv.org/html/2212.03827/assets/x8.png

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants