Feat/prism #248

fabrizio-credo · 2022-11-15T08:20:48Z

Prototype for Prism object

Overall folder/file structure

comparators: Contains the basic logic to compare different type of results. Comparators can be a function of the result container type, and potentially the evaluators
- comparator.py: abstract class
- metric_comparator.py: takes a list of Lens objects, extract metric containers and compares metrics across all the Lens objects. Comparisons are done respect to a reference Lens object. Currently the reference is defined by model name, to be changed in the future if necessary. @esherman-credo I simplified the logic, the all vs all comparison produced a lot of results that were probably not going to get used, we can reinstate it if needs be. I have added the possibility to select different operations.
compare.py: this is a type of task that prism can orchestrate. ATM it's only calling metric_comparator, in the future compare.Compare will instantiate the necessary comparators (table, metric, etc...) depending on the existing results containers.
prism.py: the orchestrator, this is the equivalent of Lens but at a higher level. The purpose of Prism is to coordinate the various tasks. Currently when the user instantiates it, they can pass a dictionary of tasks to perform on the Lens objects. The dictionary is in the format {'task_name:{'parameter':value}}, where parameters are specific to the type of task to perform.

Missing

Docstrings in general
Abstract classes for task objects (like compare.Compare). I haven't created an abstract class for this yet, I am wondering if there is a need for it, or if the variety of the object will be too big for an abstract class to be useful.
Tests

To test the run you can use the following code:


from credoai.lens import Lens
from credoai.artifacts import TabularData, ClassificationModel
from credoai.evaluators import *
from credoai.governance import Governance
import numpy as np


# imports for example data and model training
from credoai.datasets import fetch_creditdefault
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from xgboost import XGBClassifier

data = fetch_creditdefault()
df = data["data"]
df["target"] = data["target"].astype(int)

# fit model
model1 = RandomForestClassifier(random_state=42)
X = df.drop(columns=["SEX", "target"])
y = df["target"]
sensitive_features = df["SEX"]
(
    X_train,
    X_test,
    y_train,
    y_test,
    sensitive_features_train,
    sensitive_features_test,
) = train_test_split(X, y, sensitive_features, random_state=42)
model1.fit(X_train, y_train)

model2 = XGBClassifier(tree_method='hist', random_state=42, n_estimators=2, enable_categorical=True)
model2.fit(X_train, y_train)



# ## Set up artifacts

credo_RF1 = ClassificationModel(
    'credit_RF1_classifier',
    model1
)

credo_XGB = ClassificationModel(
    'credit_XGB_classifier',
    model2
)

train_data = TabularData(
    name = "UCI-credit-default-train",
    X=X_train,
    y=y_train,
    sensitive_features=sensitive_features_train
)
test_data = TabularData(
    name='UCI-credit-default-test',
    X=X_test,
    y=y_test,
    sensitive_features=sensitive_features_test
)


# pipeline scan be specifed using a sklearn-like style
metrics = ['accuracy_score', 'roc_auc_score']
pipeline = [
    (Performance(metrics)),
    (ModelFairness(metrics)),
]

pipeline2 = [
    (Performance(metrics)),
    (ModelFairness(metrics)),
]


# and added to lens in one fell swoop
lens_rf1 = Lens(
    model=credo_RF1,
    assessment_data=test_data,
    # training_data=train_data,
    pipeline=pipeline
)



lens_xgb = Lens(
    model=credo_XGB,
    assessment_data=test_data,
    # training_data=train_data,
    pipeline=pipeline2
)


from credoai.prism import Prism

prism_test = Prism(
    [lens_rf1, lens_xgb], 
    tasks={'compare':{'ref':'credit_RF1_classifier'}}
)

prism_test.execute()

prism_test.get_results()

…ibrary Merge branch 'develop' into prism/comparators

Merge branch 'prism/comparators' of https://github.com/credo-ai/credoai_lens into prism/comparators

…tor. Some brittleness due to MetricContainer structure

- Moved to Dataframe operations - Added optional definition of the operation

IanAtCredo

Left some comments. Bullet point high level critiques:

I feel like the compare.py class is unneeded and can be rolled into Prism. Happy to discuss and be wrong here though.
The way a reference is defined and IDed is not flexible enough. Only works for cross-model comparisons, when really we want cross "context" comparisons. If the same model is run on different samples of data, we may want to compare those.
The user interface of Prism, which is supposed to simplify everything, instead obfuscates.
Documentation needed for most functions (though easy to follow anyway)

credoai/prism/comparators/metric_comparator.py

credoai/prism/comparators/comparator.py

credoai/prism/comparators/metric_comparator.py

credoai/prism/comparators/comparator.py

credoai/prism/prism.py

credoai/prism/comparators/metric_comparator.py

credoai/prism/compare.py

credoai/prism/prism.py

credoai/prism/comparators/metric_comparator.py

fabrizio-credo · 2022-11-15T20:11:47Z

@IanAtCredo Overall responses to the high level critique:

I feel like the compare.py class is unneeded and can be rolled into Prism. Happy to discuss and be wrong here though.

Initially I was of the same opinion, however I feel we need this kind of objects due to the variety of tasks that Prism will have to handle in the future (not just comparisons). If everything gets put into Prism I feel it will eventually ballon up pretty quickly. I intended for Prism to only handle orchestration logic, including the translation from user input into the tasks to perform.
This is based mostly on feeling at the moment, I would like to create a sampling task following this schema and see how we feel about it at the end.

The way a reference is defined and IDed is not flexible enough. Only works for cross-model comparisons, when really we want cross "context" comparisons. If the same model is run on different samples of data, we may want to compare those.

Agreed, this was meant as a temporary solution. I think Data/Model/SensitiveFeature define the context ATM, I wonder what is the best way of conveying this information. The simplest is a combo of Model_Name + Data_Name + Feature_Name/Value, this will get ugly pretty fast.

Potential solution: leverage PepelineStep -> Create a context property which is a function of metadata, and which associate a hash to the step. In my mind each step coincides with a context. All Prism tasks will use hashes for internal referencing (this makes creating DF easier, rather than having to move around Dictionaries).
We can create a couple of Prism methods that can then translate to/from human readable output/input.

The user interface of Prism, which is supposed to simplify everything, instead obfuscates.
Documentation needed for most functions (though easy to follow anyway)

Agreed, I wasn't extremely happy about it either. We can ask the user to pass the actual (task) class, like Compare. So the param tasks in Prism will be something like this:

[
 {
   task: Compare,
   params: {ref: ...}
 },
]

another argument to keep the task objects 😛 . In any case, whether we map user input string to behavior, or we pass tasks directly, a certain amount of knowledge is necessary on user side.
Stylistically this is more similar to what we are currently doing in Lens when we build the pipeline.

IanAtCredo · 2022-11-15T23:03:50Z

Few thoughts:

Why does the user need to specify tasks at all? Or at the level of granularity you are describing? Why not just "here are a few models, either aggregate or compare them". I can buy that if you say "aggregate them", implicitly a list of tasks are created, but should that be the main way the user interacts? Perhaps some super-user will want specific access to the tasks (which we can allow), but that's secondary to the primary user interaction.

Data/Model/SensitiveFeature as context - yeah. I think we may want to define this somewhere. I was originally thinking that a run of Lens is the context. Why do we need to infer the context? Can't we just say "we took a bucket of what you said was one thing and compared it to another." To generalize this to evidence separated from Lens is trickier within your current implementation, but could be done by a list of evidence-lists, rather than inferring separation by IDs.

Metric comparator works correctly

fabrizio-credo · 2022-11-28T17:12:04Z

@IanAtCredo Update:

Formalized the concept of Task
Updated method of passing tasks to Prism, same logic as for passing evaluators to Prism
Updated PipelineStep to create a unique id for each step dependent on Evaluator name, model name, datasets names, sensitive feature.
Update comparator task and metric comparators to leverage the new id to match evidences for comparisons
Updated documentation all around
Updated the mode in which reference is defined by user:
- User specifies which type of object they want to reference: model/assessment_data/training_data
- User specifies the name of said object

IanAtCredo

I'm not sure if changes need to be made, but I feel like the current "identifier" implementation is quite shaky and will lead to problems down the road. Let's discuss whether changes are necessary.

We can also call this an experimental feature, accept some brittleness and see if it is required by others.

IanAtCredo · 2022-12-07T18:00:08Z

credoai/lens/lens.py

@@ -35,6 +36,27 @@ def __post_init__(self):
 self.evaluator.metadata = self.metadata
 self.metadata["evaluator"] = self.evaluator.name

+ @property
+ def identifier(self):


I spent a while trying to make this more robust. As it is, this seems quite brittle! We're passing an ID that is made up on-the-fly here, to be referenced via a hardcoded set of ID_Cols. Doesn't seem good.

Tried to replace with this:

@property def id(self): eval_info = self.evaluator.get_info()["metadata"] return "~".join(f"{k}:{v}" for k, v in eval_info.items())

which makes use of another function we already have to create metadata (I also changed "identifier" => "id")

This results in outputs like this:
'evaluator:Performance~model_name:credit_RF1_classifier~assessment_dataset_name:UCI-credit-default-test'

This can't be used later however because the metadata isn't consistent amongst evaluators. However, I'm confused why that should be an issue. Shouldn't we be able to do comparisons evaluator-by-evaluator when they match up? The issue now is coming up because everything is being coerced into one dataframe.

I tried to remove the explicit separation of ID into ID_Cols, but then saw that the ID_Cols were used so that evaluation could be directed to be over the ref class, like "model". I was confused about that implementation as well. Once we know that we have the same ID between two lens instances, shouldn't we be able to compare? Why do we need a ref class at all?

The reference was split in ref_type and ref to allow the flexibility of comparing either by model or by whichever dataset (training/assessment) is available. This is to accommodate 2 type of use cases:

Many models comparisons: multiple iterations of the same model vs initial case, or different models against a reference one.

Multiple dataset comparisons: comparing result when dataset change, over time or due to some form of sampling for example.

If Metadata structure was always constant then I would agree that we can simply pass the self.evaluator.get_container_info(), but as it stands I would have to still do extra steps to standardize the output in a similar way to what I am already doing.

Standardization is also good for results displaying, I think breaking down the id in labelled columns helps user understanding the results, rather then having to read a long and variable id.

IanAtCredo · 2022-12-07T18:01:04Z

credoai/prism/prism.py

+ be instantiated with their necessary artifacts. One or multiple Lens objects
+ can be provided, each Task will validate that the amount of objects provided
+ is suitable for its requirement.
+ task : Task


You said tasks work as evaluators, but here I see I can only pass one task.

Yeah I consciously constrained the power of Prism, task as we imagined them would be of 2 types:

Calculations on Lens results, produce numerical results (compare for example)

Operations on the instances, for example creating new ones based on sampling of datasets

I imagine that at the end of a Prism run some form of numerical results need to be produced.
That being the case, a Type 2 task would require another task of type 1 to be run after, so that numerical output can be produced. This would require to enforce order of execution (or to add task dependency on other tasks).

I decided to avoid the complication by allowing for one task to call another one. So for example an hypothetical cross-validation task would:

Create the Lens instances by splitting dataset appropriately

Run compare to produce numerical results

ATM I don't imagine the need to run super complex sequences of tasks, this might be a reflection on the fact that we don't have an actual use case for Prism yet.

IanAtCredo · 2022-12-07T18:02:29Z

credoai/prism/comparators/metric_comparator.py

+from typing import Callable, List, Literal, Optional
+
+import numpy as np
+from pandas import DataFrame, concat


My preference for import package as is because I find it more explicit. We import the entire package regardless and you can easily see that this is "pd.DataFrame", rather than some other DataFrame. It's also the convention I see everywhere with pandas and numpy.

IanAtCredo

As mentioned, I really don't think this ID implementation is great. Or, at least, the ID specification conflicts with other "sources of truth" about artifacts reflected in get_container_info. I also don't think parsing an ID string is reasonable - why not query the evaluator directly during the compare step?

All that said, I don't think we should invest more time in this. It works, let's move on.

github-actions · 2022-12-19T23:36:05Z

Coverage Report

File	Stmts	Miss	Cover	Missing
credoai
__init__.py	3	0	100%
credoai/artifacts
__init__.py	7	0	100%
credoai/artifacts/data
__init__.py	0	0	100%
base_data.py	107	12	89%	55, 155, 158, 173, 180, 187, 191, 195, 199, 211, 214, 221
comparison_data.py	63	13	79%	53, 60, 71, 76, 81, 90, 96, 100, 105, 114, 147, 153, 156
tabular_data.py	40	6	85%	52, 73, 77, 96, 98, 105
credoai/artifacts/model
__init__.py	0	0	100%
base_model.py	36	2	94%	56, 88
classification_model.py	23	0	100%
comparison_model.py	11	0	100%
constants_model.py	2	0	100%
regression_model.py	11	4	64%	43–45, 48
credoai/evaluators
__init__.py	15	0	100%
data_fairness.py	147	12	92%	83–90, 205, 260–261, 287, 311, 334–340, 356
data_profiler.py	34	2	94%	57, 60
deepchecks.py	40	3	92%	113–122
equity.py	153	31	80%	78, 181–184, 204, 230–257, 281–296, 307–309, 358–359
evaluator.py	70	7	90%	58, 61, 80, 106, 126, 174, 181
fairness.py	149	12	92%	117, 238, 246–251, 318–327, 329, 341–344
feature_drift.py	59	1	98%	66
identity_verification.py	112	2	98%	144–145
model_profiler.py	74	11	85%	128–131, 145–148, 182–183, 192–193, 231
performance.py	119	14	88%	110, 137–143, 232–241, 243, 260–263
privacy.py	118	4	97%	410, 447–449
ranking_fairness.py	134	14	90%	136–137, 157, 178, 184–185, 382–404, 409–439
security.py	96	1	99%	297
shap.py	87	14	84%	119, 127–128, 138–144, 170–171, 253–254, 284–292
survival_fairness.py	67	50	25%	29–33, 36–48, 53–64, 67–78, 81–99, 102, 105, 108
credoai/evaluators/utils
__init__.py	3	0	100%
fairlearn.py	18	2	89%	46, 59
utils.py	8	1	88%	9
validation.py	81	28	65%	14, 34–35, 37–39, 46, 67–74, 80–86, 89, 95–98, 105, 108, 111, 114–115, 119–121
credoai/governance
__init__.py	1	0	100%
credoai/lens
__init__.py	2	0	100%
lens.py	201	13	94%	53, 195–196, 232–237, 294, 336, 360, 442, 457, 461, 473
pipeline_creator.py	60	12	80%	20–21, 37, 79–91
utils.py	39	28	28%	20–27, 49–52, 71–82, 99, 106–109, 128–135
credoai/modules
__init__.py	3	0	100%
constants_deepchecks.py	2	0	100%
constants_metrics.py	19	0	100%
constants_threshold_metrics.py	3	0	100%
metric_utils.py	24	18	25%	15–30, 34–55
metrics.py	61	7	89%	62, 66, 69–70, 73, 83, 120
metrics_credoai.py	135	47	65%	68–69, 73, 93–102, 107–109, 132–160, 176–179, 206, 230–231, 294–296, 372–378, 414–415, 485–486
stats.py	39	28	28%	11–14, 17–22, 25–27, 30–35, 38–52, 55–60
stats_utils.py	5	3	40%	5–8
credoai/prism
__init__.py	3	0	100%
compare.py	35	2	94%	70, 86
prism.py	36	4	89%	46, 48, 59, 86
task.py	17	2	88%	30, 37
credoai/prism/comparators
__init_.py	0	0	100%
comparator.py	17	3	82%	34, 42, 47
metric_comparator.py	44	2	95%	125, 131
credoai/utils
__init__.py	5	0	100%
common.py	102	40	61%	55, 68–69, 75, 84–91, 96–104, 120–126, 131, 136–141, 152–159, 186
constants.py	2	0	100%
dataset_utils.py	61	35	43%	23, 26–31, 50, 54–55, 88–119
logging.py	55	13	76%	10–11, 14, 19–20, 23, 27, 44, 58–62
model_utils.py	30	11	63%	14–19, 29–30, 35–40
version_check.py	11	1	91%	16
TOTAL	2899	515	82%

fabrizio-credo and others added 21 commits November 1, 2022 14:00

First commit

aa3f200

basic comparator skeleton

5896d80

Temporary changes bookmarks

0d5dec2

Draft of comparators class and metric comparator

33663c0

Pulling change from develop the removes import of deprecated symbol l…

8800698

…ibrary Merge branch 'develop' into prism/comparators

metric comparator draft

629c22f

basic comparator skeleton

46ea484

Draft of comparators class and metric comparator

392c08e

metric comparator draft

183a424

Merging to incorporate xgboost fix in ClassificationModel

dcee7e0

Merge branch 'prism/comparators' of https://github.com/credo-ai/credoai_lens into prism/comparators

Working prototype of scalar difference calculation for metric compara…

b8aa30e

…tor. Some brittleness due to MetricContainer structure

implementation of highest and lowest eval member functions

e0583c3

Merge branch 'prism/comparators' into feat/prism

5ef0c76

Restructuring of the comparators objects

59db3af

Re-organized the prism structure

e4d3292

Pair comparator created

e473032

Renaming file

5f2ecd3

Refactored comparator:

a3453c9

- Moved to Dataframe operations - Added optional definition of the operation

Updated input format for metric comparator

5ccfd88

Compare object working

dcb3a44

Comparison run completed from Prism

16688a3

fabrizio-credo added the enhancement New feature or request label Nov 15, 2022

fabrizio-credo requested review from IanAtCredo and esherman-credo November 15, 2022 08:20

fabrizio-credo self-assigned this Nov 15, 2022

IanAtCredo suggested changes Nov 15, 2022

View reviewed changes

fabrizio-credo added 2 commits November 23, 2022 11:12

Added id property to step

517d0e9

Finalized logic for unique step ID

2c6cac8

Metric comparator works correctly

fabrizio-credo added 3 commits November 25, 2022 11:15

Removed repeated validation

5cc9ee2

New task class + passing system

deb0b9f

Docstrings update

f969516

fabrizio-credo requested a review from IanAtCredo November 28, 2022 17:12

fabrizio-credo and others added 4 commits November 28, 2022 11:23

Merge branch 'develop' into feat/prism

68328c7

Fixing connect imports

1a53e9b

Merge branch 'develop' into feat/prism

a2e5fda

minor text changes and bug fix

65757af

IanAtCredo reviewed Dec 7, 2022

View reviewed changes

IanAtCredo self-requested a review December 16, 2022 17:59

IanAtCredo approved these changes Dec 16, 2022

View reviewed changes

fabrizio-credo changed the title ~~[WIP] Feat/prism~~ Feat/prism Dec 19, 2022

fabrizio-credo added 3 commits December 19, 2022 11:25

Merge branch 'develop' into feat/prism

73b8281

Merge branch 'develop' into feat/prism

ea41790

Added prism tests

1ad4b89

fabrizio-credo merged commit 7f49358 into develop Dec 20, 2022

fabrizio-credo deleted the feat/prism branch December 20, 2022 00:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat/prism #248

Feat/prism #248

fabrizio-credo commented Nov 15, 2022

IanAtCredo left a comment

fabrizio-credo commented Nov 15, 2022

IanAtCredo commented Nov 15, 2022

fabrizio-credo commented Nov 28, 2022 •

edited

Loading

IanAtCredo left a comment

IanAtCredo Dec 7, 2022

fabrizio-credo Dec 9, 2022

IanAtCredo Dec 7, 2022

fabrizio-credo Dec 9, 2022

IanAtCredo Dec 7, 2022

IanAtCredo left a comment

github-actions bot commented Dec 19, 2022

Feat/prism #248

Feat/prism #248

Conversation

fabrizio-credo commented Nov 15, 2022

Prototype for Prism object

Overall folder/file structure

Missing

IanAtCredo left a comment

Choose a reason for hiding this comment

fabrizio-credo commented Nov 15, 2022

IanAtCredo commented Nov 15, 2022

fabrizio-credo commented Nov 28, 2022 • edited Loading

IanAtCredo left a comment

Choose a reason for hiding this comment

IanAtCredo Dec 7, 2022

Choose a reason for hiding this comment

fabrizio-credo Dec 9, 2022

Choose a reason for hiding this comment

IanAtCredo Dec 7, 2022

Choose a reason for hiding this comment

fabrizio-credo Dec 9, 2022

Choose a reason for hiding this comment

IanAtCredo Dec 7, 2022

Choose a reason for hiding this comment

IanAtCredo left a comment

Choose a reason for hiding this comment

github-actions bot commented Dec 19, 2022

fabrizio-credo commented Nov 28, 2022 •

edited

Loading