implement test cases for reproducing the results #19

mariopenglee · 2023-02-02T18:36:18Z

only dataframe comparison for now, need to add evaluations

lauritowal · 2023-02-02T20:00:26Z

elk/tests/test_extraction_output.py

+ assert row["accuracy"] <= current_performance.iloc[index]["accuracy"]
+ assert row["std"] <= current_performance.iloc[index]["std"]


I guess we should maybe add some +/- epsilon (e.g having value 3) to increase the tolerance a bit (?)

Yeah probably. We should make sure the random seed is fixed and then just see empirically how big the discrepancy is with the current code

lauritowal · 2023-02-04T19:55:54Z

elk/tests/test_extraction_output.py

+models_layer_num = default_config["models_layer_num"]
+
+
+def train_and_evaluate(model_name, dataset):


We want to be able to train on one dataset and evaluate either on the same or another dataset.

lauritowal · 2023-02-04T20:00:03Z

elk/tests/test_extraction_output.py

+ train_and_evaluate(model_name, dataset)
+
+ # Read the contents of the csv files into pandas dataframes
+ benchmark_performance = pd.read_csv(f"../test_data/{model_name}_confusion_0.csv")


To be able to reproduce the results of the paper, we want mostly the prefix "normal" not "confusion" --> See parser file. However, it might make sense to have an option to be able to select between the different prefixes for the test

lauritowal · 2023-02-04T20:02:01Z

elk/tests/test_extraction_output.py

+
+ # Compare the contents of the dataframes,
+ # make sure current performance is within threshold (epsilon) or better
+ epsilon = 3


it would be better if we pass epsilon, instead of hard-coding it here

lauritowal · 2023-02-04T20:03:44Z

elk/tests/test_extraction_output.py

+ Train and evaluate the model with the given dataset.
+ """
+ training_command = "python -m elk.train "
+ f"--model {model_name} --prefix normal --dataset {dataset} --num_data 1000 --seed 0"


Maybe it is better, if we call the functions from train.py and evaluate.py directly?

lauritowal · 2023-02-04T20:07:58Z

elk/tests/test_extraction_output.py

+ train_and_evaluate(model_name, dataset)
+
+ # Read the contents of the csv files into pandas dataframes
+ benchmark_performance = pd.read_csv(f"../test_data/{model_name}_confusion_0.csv")


You might want to create this csv file (probably manually) and add it to the repo, based on the format / structure of the current .csv there... It should contain all the different combinations from the table in the paper (for logistic regression and CCS only): https://ar5iv.labs.arxiv.org/html/2212.03827/assets/x8.png

lauritowal reviewed Feb 2, 2023

View reviewed changes

lauritowal changed the title ~~implement test case for confusion~~ implement test cases for reproducing the results Feb 3, 2023

lauritowal requested changes Feb 4, 2023

View reviewed changes

mariopenglee closed this Feb 16, 2023

mariopenglee force-pushed the master branch from 97378a6 to 28d59b0 Compare February 16, 2023 04:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

implement test cases for reproducing the results #19

implement test cases for reproducing the results #19

mariopenglee commented Feb 2, 2023

lauritowal Feb 2, 2023

norabelrose Feb 2, 2023

lauritowal Feb 4, 2023

lauritowal Feb 4, 2023

lauritowal Feb 4, 2023

lauritowal Feb 4, 2023

lauritowal Feb 4, 2023 •

edited

Loading

		assert row["accuracy"] <= current_performance.iloc[index]["accuracy"]
		assert row["std"] <= current_performance.iloc[index]["std"]

		models_layer_num = default_config["models_layer_num"]


		def train_and_evaluate(model_name, dataset):

implement test cases for reproducing the results #19

implement test cases for reproducing the results #19

Conversation

mariopenglee commented Feb 2, 2023

lauritowal Feb 2, 2023

Choose a reason for hiding this comment

norabelrose Feb 2, 2023

Choose a reason for hiding this comment

lauritowal Feb 4, 2023

Choose a reason for hiding this comment

lauritowal Feb 4, 2023

Choose a reason for hiding this comment

lauritowal Feb 4, 2023

Choose a reason for hiding this comment

lauritowal Feb 4, 2023

Choose a reason for hiding this comment

lauritowal Feb 4, 2023 • edited Loading

Choose a reason for hiding this comment

lauritowal Feb 4, 2023 •

edited

Loading