To get a quick overview of what JustCause has to offer, let's take a look at the package structure:
├── contrib <- third party methods not meant to be accessed directly ├── data │ ├── generators <- data generators for full synthetic data sets │ ├── sets <- empirical data sets like IHDP, Twins, etc. │ ├── frames <- provides the CausalFrame │ ├── transport <- functionality to download data sets │ └── utils <- generic helper functions for data sets ├── learners │ ├── ate <- average treatment effect estimators │ ├── meta <- meta learners working with the help of classical estimators │ ├── propensity <- functionality estimate propensity scores │ └── utils <- generic helper functions for learners ├── evaluation <- helper functions for evaluation ├── metrics <- various metrics to compare a result to the ground truth └── utils <- most generic helper functions not related to data and learners
Most commonly you will deal with :mod:`.data.generators` and :mod:`.data.sets` to generate or fetch a data set and apply some basic learners within :mod:`.learners`. To evaluate your results you can use :mod:`.metrics` and :mod:`.evaluation`. All methods within :mod:`.contrib` are not meant to be accessed directly and are wrapped within :mod:`.learners`.
Note
Since version 0.4 the :mod:`.learners` is greatly reduced to focus the attention and efforts of JustCause on evaluation. See section `Implement
Due to the so called Fundamental Problem of Causal Inference, there is no ground truth for any real treatment effect dataset. In order to be able to evaluate methods, we thus need to resort to semi- or fully-synthetic data. The process of generating such a dataset is called a data generating process (DGP). We distinguish between i) an Empirical Monte Carlo Study [1] which uses real covariates - the features of real instances (e.g. patients, ...) - and generates a synthetic potential outcome on top of it and ii) a fully synthetic approach, where covariates are sampled from some distribution.
Briefly, a reference data set following our convention contains these columns with special meaning:
t
: binary treatment indicatory
: observed outcomey_cf
: counterfactual outcomey_0
: untreated potential outcome with possible noisey_1
: treated potential outcome with possible noisemu_0
: true untreated potential outcome without noisemu_1
: true treated potential outcome without noiseite
: true individual treatment effect
For those columns the following relationships hold:
y = t*y_1 + (1-t)*y_0
y_cf = (1-t)*y_1 + t*y_0
(counterfactual ofy
)y = y_0 if t == 0 else y_1
y_0 = mu_0 + e
andy_1 = mu_1 + e
where e (read epsilon) is some random noise or 0ite = mu_1 - mu_0
Besides these columns, there are covariates (also called features) and optionally other columns for managing meta information
like datetime or an id of sample. Within the provided data sets covariates are called x_0
, x_1
, etc. by default.
However, named covariates from the IBM or Twins studies are kept in their format (e.g. dob_mm
for date-of-birth month) for clarity.
Accordingly you can also use any name if you use your own data set as explained below. The matrix of all covariates is X := [x_0, x_1, ..., x_n]
and its usage is explained below.
Besides covariates, the provided data sets have a column sample_id
to easily identify one sample in different replications.
Since most DGPs are based on some form of random sampling, usually researchers use multiple so called replications of the same data to avoid a large influence of the randomness underlying the distributions. A replication is generated by sampling from the probability distributions that define the data. In the case of IHDP 1000 replications of the same data are used for a full evaluation, thus ensuring robust evaluation results.
The concept of replications is build into JustCause by design to encourage robust comparisons.
JustCause uses a generalization of a Pandas :class:`~pandas.DataFrame` for managing your data named :class:`~justcause.CausalFrame`. A CausalFrame encompasses all the functionality of a Pandas DataFrame but additionally keeps track which columns, besides the ones with special meanings like explained above, are covariates or others. This allows to easily access them in a programmatic way.
All data sets provided by JustCause are provided as lists of CausalFrames, i.e. for each replication one CausalFrame.
Thus, we get a single CausalFrame cf
from one of the provided data sets by:
>>> from justcause.data.sets import load_ihdp >>> cf = load_ihdp(select_rep=0)[0] # select replication 0 >>> type(cf) justcause.data.frames.CausalFrame
As usual, cf.columns
would list the names of all columns. To find out which of these columns are covariates or
others, we can use the attribute accessor names
:
>>> cf.names.covariates ['x_0', 'x_1', 'x_2', ..., 'x_22', 'x_23', 'x_24'] >>> cf.names.others ['sample_id']
This allows us to easily apply transformations for instance only to covariates. In general, this leads to more robust code
since the API of a CausalFrame enforces the differentiation between covariates, columns with special meaning, e.g.
outcome y
, treatment t
and other columns such as metadata like a datetime or an id of an observation, e.g. sample_id
.
If we want to construct a CausalFrame, we do that just in the same way as with a DataFrame but have to specify covariate columns:
>>> import justcause as jc >>> from numpy.random import rand, randint >>> import numpy as np >>> import pandas as pd >>> N = 10 >>> mu_0 = np.zeros(N) >>> mu_1 = np.zeros(N) >>> ite = mu_1 - mu_0 >>> y_0 = mu_0 + 0.1*rand(N) >>> y_1 = mu_1 + 0.1*rand(N) >>> t = randint(2, size=N) >>> y = np.where(t, y_1, y_0) >>> y_cf = np.where(t, y_0, y_1) >>> dates = pd.date_range('2020-01-01', periods=N) >>> cf = jc.CausalFrame({'c1': rand(N), >>> 'c2': rand(N), >>> 'date': dates, >>> 't': t, >>> 'y': y, >>> 'y_cf': y_cf, >>> 'y_0': y_0, >>> 'y_1': y_1, >>> 'mu_0': mu_0, >>> 'mu_1': mu_1, >>> 'ite': ite >>> }, >>> covariates=['c1', 'c2'])
All columns that are neither covariates nor columns with special meaning like t
and y
are treated as others:
>>> cf.names.others ['date']
Within the PyData stack, Numpy surely is the lowest common denominator and is thus used by a lot of libraries. Since JustCause mainly wraps third-party libraries for causal methods under a common API, the decision was taken to only allow passing Numpy arrays to the learners, i.e. causal methods, within JustCause. This allows for more flexibility and keeps the abstraction layer to the original method much smaller.
The fit
method of a learner takes at least the parameters X
for the covariate matrix, t
for the treatment
and y
for the outcome, i.e. target, vector as Numpy arrays. In order to bridge the gap between rich CausalFrames and
plain arrays, a :class:`~justcause.CausalFrame` provides the attribute accessor np
(for numpy). Using it, we can easily pass
the covariates X
, treatment t
and outcome y
to a learner:
>>> from sklearn.ensemble import RandomForestRegressor >>> reg = RandomForestRegressor() >>> learner = jc.learners.SLearner(reg) >>> learner.fit(cf.np.X, cf.np.t, cf.np.y)
The central element of JustCause is evaluation. We want to score learners on various datasets using common metrics. This can either be done manually, or using predefined standard routines (:func:`~justcause.evaluation.evaluate_ite`). JustCause allows you to do both.
The simplest and fastest evaluation is using standard datasets and the methods provided by JustCause:
from justcause.learners import SLearner, TLearner from justcause.metrics import pehe_score, mean_absolute from justcause.data.sets import load_ihdp replications = load_ihdp(select_rep=np.arange(100)) metrics = [pehe_score, mean_absolute] train_size = 0.8 random_state = 42 methods = [basic_slearner, weighted_slearner] # All in standard configuration methods = [SLearner(), TLearner()] result = evaluate_ite(replications, methods, metrics, train_size=train_size, random_state=random_state)
Here, we use two methods basic_slearner
and weighted_slearner
that haven't been defined yet. To better understand what's happening inside
and how to customize, let us take a look at an evaluation loop in more detail.
Let's implement a simple evaluation of two learners - a weighted SLearner vs. a standard SLearner. The standard SLearner is already provided in :class:`~justcause.learners.meta.slearner.SLearner`, while the weighted SLearner requires a slight adaption. We define a callable, which takes train and test data, fits a weighted model and predicts ITE for both train and test samples:
from justcause.learners import SLearner from justcause.learners.propensity import estimate_propensities from sklearn.linear_model import LinearRegression def weighted_slearner(train, test): """ Custom method that takes 'train' and 'test' CausalFrames (see causal_frames.ipynb) and returns ITE predictions for both after training on 'train'. Implement your own method in a similar fashion to evaluate them within the framework! """ train_X, train_t, train_y = train.np.X, train.np.t, train.np.y test_X, test_t, test_y = test.np.X, test.np.t, test.np.y # Get calibrated propensity estimates p = estimate_propensities(train_X, train_t) # Make sure the supplied learner is able to use `sample_weights` in the fit() method slearner = SLearner(LinearRegression()) # Weight with inverse probability of treatment (inverse propensity) slearner.fit(train_X, train_t, train_y, weights=1/p) return ( slearner.predict_ite(train_X, train_t, train_y), slearner.predict_ite(test_X, test_t, test_y) ) def basic_slearner(train, test): """Basic SLearner callable""" train_X, train_t, train_y = train.np.X, train.np.t, train.np.y test_X, test_t, test_y = test.np.X, test.np.t, test.np.y slearner = SLearner(LinearRegression()) slearner.fit(train_X, train_t, train_y) return ( slearner.predict_ite(train_X, train_t, train_y), slearner.predict_ite(test_X, test_t, test_y) )
Note
Another way to add new learners is to implement them as a class similiar to the implementations in :mod:`~justcause.learners`
(for example :class:`~justcause.learners.meta.slearner.SLearner`) providing at least the methods fit(x, t, y)
and predict_ite(x, t, y)
.
See the section Implementing New Learners for more.
Given the two functions defined above, we can go ahead and write our own simple evaluation loop:
import numpy as np import pandas as pd from sklearn.model_selection import train_test_split from justcause.data import Col from justcause.data.sets import load_ihdp from justcause.metrics import pehe_score, mean_absolute from justcause.evaluation import calc_scores, summarize_scores replications = load_ihdp(select_rep=np.arange(100)) metrics = [pehe_score, mean_absolute] train_size = 0.8 random_state = 42 methods = [basic_slearner, weighted_slearner] results = list() for method in methods: test_scores = list() train_scores = list() for rep in replications: train, test = train_test_split( rep, train_size=train_size, random_state=random_state ) # REPLACE this with the function you implemented and want to evaluate train_ite, test_ite = method(train, test) # Calculate the scores and append them to a dataframe test_scores.append(calc_scores(test[Col.ite], test_ite, metrics)) train_scores.append(calc_scores(train[Col.ite], train_ite, metrics)) # Summarize the scores and save them in a dataframe train_result, test_result = summarize_scores(train_scores), summarize_scores(test_scores) train_result.update({'method': method.__name__, 'train': True}) test_result.update({'method': method.__name__, 'train': False}) results.append(train_result) results.append(test_result)
Finally, we can compare the results:
method | train | pehe_score-mean | ... | mean_absolute-std |
basic_slearner | True | 5.633659795888 | ... | 1.4932757697867 |
basic_slearner | False | 5.625971000721 | ... | 2.4746034286861 |
weighted_slearner | True | 5.592355721307 | ... | 0.5243953093767 |
weighted_slearner | False | 5.493401193725 | ... | 0.9419412237398 |
In the above evaluation loop, train_scores
contains the scores of ITE prediction on the train set for each replication.
To better understand what's happening inside, let's take a look at these intermediate scores:
>>> pd.DataFrame(train_scores) # for better visualization # pehe_score mean_absolute 0 0.893524 0.074874 1 0.826433 0.200816 2 0.909720 0.080099 3 1.945077 0.091223 4 2.671555 0.466394 ... ... ... 95 2.194153 0.180240 96 2.161083 0.087108 97 13.238825 1.218813 98 3.917264 0.054858 99 2.538756 0.654481
And we then summarize these scores using different formats (like np.mean
, np.std
, ...):
>>> train_result = summarize_scores(train_scores) >>> pd.DataFrame([train_result])
which yields:
pehe_score-mean | pehe_score-median | pehe_score-std | ... | mean_absolute-std |
5.592356 | 2.569472 | 8.248291 | ... | 0.524395 |
There's two things we can make a lot simpler using JustCause:
- Standard methods can be used as-is
- Standard evaluation is pre-implemented
Using the standard evaluation looks like this:
from justcause.evaluation import evaluate_ite result = evaluate_ite(replications, methods, metrics, train_size=train_size, random_state=random_state)
And, we can also get rid of basic_slearner
since that is the default usage of a learner:
fit on train, predict on train and test without special settings or parameters. Instead, we simply
pass the instantiation of the SLearner
along to the methods parameter.
# All in standard configuration methods = [SLearner(), weighted_slearner] result = evaluate_ite(replications,
methods, metrics, train_size=train_size, random_state=random_state)
Note
Note that the Meta Learners use a default setting to determine which regression to use when none is provided.
JustCause provides some of the most common reference dataset, but is open for extension. You can either provide fixed reference datasets or define a parametric data generation process (DGP) that generates new data.
In the JustCause Data Repository we provide datasets in the .parquet
format, which is highly efficient and can easily be read by Pandas.
In order to avoid duplicate data we store covariates and outcomes in separate files and only join them upon loading.
This is to say that usually we have a fixed set of covariates for a number of instances.
In the outcomes file we define factual outcomes and counterfactual for these instances for one or multiple replications.
ext:rst
Note
If you have a new reference dataset or a useful set of covariates and want to allow others to use it,
feel free to submit a Pull Request in the JustCause Data Repository and this repo. See :mod:`.data.sets.ihdp` for an example.
In :mod:`.data.transport` we provide useful methods for fetching data. You only have to add the top level access and the respective
directory in justcause-data
.
If you only want to use your data once, you can simply load it directly into a CausalFrame as shown in section :ref:`handling-data`.
Another way to generate data is by defining the functions of the potential outcomes based on the covariates. This allows to sample as many replications as one requires. Let's walk through an example to see what parts are required. Let's assume with work with the covariates from the IHDP study provided in :mod:`.data.sets.ihdp`.
Note
For a fully fledged DGP we need:
- Covariates
- Potential Outcomes with and without noise
- Treatment Assignment
We simply access the covariates with:
>>> from justcause.data.sets.ihdp import get_ihdp_covariates >>> covariates = get_ihdp_covariates() >>> covariates.shape (747, 25)
Let's define the outcome based on the following function:
y_0 &= N(0, 0.2) \\ y_1 &= y_0 + \tau \\
where
c &= \mathbb{I}(sigmoid(X_8) > 0.5) \\ \tau &= N(3*c + (1 - c), 0.1).
To implement that as a DGP in JustCause we define the outcome function as follows:
from sklearn.utils import check_random_state # ensures usable random state from justcause.data.utils import generate_data from scipy.special import expit def outcome(covariates, *, random_state: RandomState, **kwargs): random_state = check_random_state(random_state) # define tau prob = expit(covariates[:, 7]) > 0.5 tau = random_state.normal((3 * prob) + 1 * (1 - prob), 0.1) y_0 = random_state.normal(0, 0.2, size=len(covariates)) y_1 = y_0 + tau mu_0, mu_1 = y_0, y_1 # no noise for this example return mu_0, mu_1, y_0, y_1
Hint
Every outcome function you want to use with JustCause must take a covariates
parameter and a random_state
. Using the
**kwargs
, you can pass further parameters to the outcome and treatment assignemnt function.
The outcome function must return all four potential outcomes (with and without noise).
In order to get a confounded example, we assign treatment based on the covariates X_8
that was already used to define
the strength of the treatment effect.
t = \text{BERN}[sigmoid(X_8)]
As a function this is simply:
def treatment(covariates, *, random_state: RandomState, **kwargs): random_state = check_random_state(random_state) return random_state.binomial(1, p=expit(covariates[:, 7]))
Hint
The treatment function also has to accept covariates
and random_state
arguments and should return
the treatment indicator vector for the given covariates. Again, **kwargs
can be used to pass further parameters
With the covariates we fetched and the outcomes we defined, we can now sample data from that DGP using the powerful :meth:`~justcause.data.utils.generate_data`:
>>> replications = generate_data( covariates, treatment, outcome, n_samples=747, # Optional but 747 is the maximum available with IHDP covariates n_replications=100, random_state=0, # Fix random_state for replicability **kwargs=None, # Use if further parameters are required )
By using :meth:`~justcause.data.utils.generate_data` we encourage a consistent terminology for DGPs. This is important as we've found that different researchers use different formalizations that are technically identical. However, assuring that they are actually the same requires one to transform the notation.
Take for example the synthetic example studies in the RLearner Paper where outcomes are defined as
y_i = b(X) + (W - 0.5) \cdot \tau(X) + \sigma \epsilon(X).
That is to say, they start from a base value b(X) and add or substract half the treatment effect \tau(X) depending on the treatment. This can be defined equivalently in our terminology as:
\mu_{0} = b(X) - \frac{1}{2}\cdot \tau, \\ \mu_{1} = b(X) + \frac{1}{2}\cdot \tau, \\ y_{0} = \mu_{0} + \sigma \epsilon(X), \\ y_{1} = \mu_{1} + \sigma \epsilon(X).
We encourage users of JustCause to start their considerations with the terminology introduced at the top of this document.
As of JustCause 0.4 only very basic learners - namely the S- and T-Learner are provided in :mod:`.learners`. Here, we clarify how to implement and use more learners with JustCause. The consideration behind removing all the learners was, that all the different packages out there (e.g. causalML) sport different APIs for there learners and are changing quickly. Instead of exerting efforts on trying to unify these APIs with the one proposed in JustCause, we provide two ways of adapting whatever methods you have at hand to work with Justcause.
- Implementation as a method (See Evaluating Learners)
- Implementation as a class
For recurring use of a learner within the JustCause package it might be favorable to wrap a learner in a class. For example, the RLearner from causalML can be wrapped in the way it was done it JustCause 0.3.2
"""Wrapper of the python RLearner implemented in the ``causalml`` package""" from typing import Optional, Union import numpy as np from numpy.random import RandomState from sklearn.linear_model import LinearRegression from sklearn.utils import check_random_state from ..propensity import estimate_propensities StateType = Optional[Union[int, RandomState]] class RLearner: """A wrapper of the BaseRRegressor from ``causalml`` Defaults to LassoLars regression as a base learner if not specified otherwise. Allows to either specify one learner for both tasks or two distinct learners for the task outcome and effect learning. """ def __init__( self, learner=None, outcome_learner=None, effect_learner=None, random_state: StateType = None, ): """Setup an RLearner with defaults Args: learner: default learner for both outcome and effect outcome_learner: specific learner for outcome effect_learner: specific learner for effect random_state: RandomState or int to be used for K-fold splitting. NOT used in the learners, this has to be done by the user. """ from causalml.inference.meta import BaseRRegressor if learner is None and (outcome_learner is None and effect_learner is None): learner = LinearRegression() self.random_state = check_random_state(random_state) self.model = BaseRRegressor( learner, outcome_learner, effect_learner, random_state=random_state ) def fit(self, x: np.array, t: np.array, y: np.array, p: np.array = None) -> None: """Fits the RLearner on given samples. Defaults to `justcause.learners.propensities.estimate_propensities` for ``p`` if not given explicitly, in order to allow a generic call to the fit() method Args: x: covariate matrix of shape (num_instances, num_features) t: treatment indicator vector, shape (num_instances) y: factual outcomes, (num_instances) p: propensities, shape (num_instances) """ if p is None: # Propensity is needed by CausalML, so we estimate it, # if it was not provided p = estimate_propensities(x, t) self.model.fit(x, p, t, y) def predict_ite(self, x: np.array, *args) -> np.array: """Predicts ITE for given samples; ignores the factual outcome and treatment Args: x: covariates used for precition *args: NOT USED but kept to work with the standard ``fit(x, t, y)`` call """ # assert t is None and y is None, "The R-Learner does not use factual outcomes" return self.model.predict(x).flatten() def estimate_ate( self, x: np.array, t: np.array, y: np.array, p: Optional[np.array] = None ) -> float: """Estimate the average treatment effect (ATE) by fit and predict on given data Estimates the ATE as the mean of ITE predictions on the given data. Args: x: covariates of shape (num_samples, num_covariates) t: treatment indicator vector, shape (num_instances) y: factual outcomes, (num_instances) p: propensities, shape (num_instances) Returns: the average treatment effect estimate """ self.fit(x, t, y, p) ite = self.predict_ite(x, t, y) return float(np.mean(ite))
In the code above, we've used the internal functionality of the causalml class BaseRRegressor
and have wrapped in our API definition that works directly within the JustCause evaluation.
Having implemented that once, we can used it in the prototypical evalaution just like the :class:`~justcause.learners.meta.slearner.SLearner`
>>> methods = [SLearner(), RLearner()] >>> result = evaluate_ite(replications, methods, metrics, train_size=train_size, random_state=random_state)
Similarly, we could wrap the RLearner in a function, like it was done in Evaluating Learners for the SLearner.