Skip to content

Commit

Permalink
[New refutation] Add OverRule for learning Boolean rules to describe …
Browse files Browse the repository at this point in the history
…support/overlap (#791)

* feat: Add scaffolding for overrule, including basic test

Signed-off-by: Michael Oberst <[email protected]>

* feat: Update dependencies for overrule

Signed-off-by: Michael Oberst <[email protected]>

* feat: Add the full set of overrule code

Signed-off-by: Michael Oberst <[email protected]>

* style: Black styling on overrule code

style: Black formatting on beam_search

style: Black formatting for ruleset

style: Black format utils

style: Black formatting on load_process_data_BCS
Signed-off-by: Michael Oberst <[email protected]>

* fix: Change np.matmul to np.dot

Signed-off-by: Michael Oberst <[email protected]>

* fix: Update to appropriate matmul notation for latest CVXPY

Signed-off-by: Michael Oberst <[email protected]>

* feat: Remove unnecessary overrule code

Signed-off-by: Michael Oberst <[email protected]>

* feat: Minimum working example for OverRule

Signed-off-by: Michael Oberst <[email protected]>

* test: Minimum viable test for OverRule

test: Fix test to work with new interface
Signed-off-by: Michael Oberst <[email protected]>

* feat: Print rules with option to recompute metrics

Signed-off-by: Michael Oberst <[email protected]>

* feat: Improve printing of results for refutation

Signed-off-by: Michael Oberst <[email protected]>

* feat: Pass in additional arguments to OverRule

Signed-off-by: Michael Oberst <[email protected]>

* docs: Add docstrings to ruleset.py

docs: Docstrings for assess_overlap.py
Signed-off-by: Michael Oberst <[email protected]>

* feat: Update logger

Signed-off-by: Michael Oberst <[email protected]>

* docs: Add additional docstrings

Signed-off-by: Michael Oberst <[email protected]>

* fix: Path bug

Signed-off-by: Michael Oberst <[email protected]>

* feat: Add notebook with a toy example to demonstrate OverRule

Signed-off-by: Michael Oberst <[email protected]>

* feat: Add back using LP coeff by default

Signed-off-by: Michael Oberst <[email protected]>

* docs: Typing and docstrings

docs: Consistent module docstrings

docs: Add docstrings to overrule/utils.py

docs: Add docstrings to overrule/BCS/beam_search.py

docs: Add doctstrings and typing to load_process_data_BCS

docs: Add docstrings and type hints to BCS/overlap_boolean_rule.py
Signed-off-by: Michael Oberst <[email protected]>

* docs: Fix and rename notebook for demonstrating overrule

Signed-off-by: Michael Oberst <[email protected]>

* feat: Use default_rng instead of setting a global seed in sample_Unif

Signed-off-by: Michael Oberst <[email protected]>

* fix: Use rng in place of numpy.random

Signed-off-by: Michael Oberst <[email protected]>

* fix: Replace list with numpy array to fix type error

Signed-off-by: Michael Oberst <[email protected]>

* docs: Fix type hint on ref_range

Signed-off-by: Michael Oberst <[email protected]>

* feat: Add option to only fit overlap or support rules

Signed-off-by: Michael Oberst <[email protected]>

* docs: Flesh out example notebook with parameters

Signed-off-by: Michael Oberst <[email protected]>

* feat: Add thresh_override as a argument for configuration

Signed-off-by: Michael Oberst <[email protected]>

* feat: Functional API for overrule has defaults

Signed-off-by: Michael Oberst <[email protected]>

* docs: Add API reference

Signed-off-by: Michael Oberst <[email protected]>

* ci: Fix support rule test

Signed-off-by: Michael Oberst <[email protected]>

* chore: Update poetry.lock for cxvpy dependency

Signed-off-by: Michael Oberst <[email protected]>

* feat: Remove `XGBClassifier` as default classifier

To avoid dependency on `xgboost`, replace `XGBClassifier` as the default
propensity score model with `RandomForestClassifier` from `sklearn`

Signed-off-by: Michael Oberst <[email protected]>

* fix: Fix logic so that when verbose=True, silent=False

Signed-off-by: Michael Oberst <[email protected]>

* feat: Remove seaborn dependency from overrule notebook

Signed-off-by: Michael Oberst <[email protected]>

* feat: Add option to pass random seed to support estimation

Signed-off-by: Michael Oberst <[email protected]>

* docs: Update notebook to use random seed on support estimation

Signed-off-by: Michael Oberst <[email protected]>

* fix: Typo

Signed-off-by: Michael Oberst <[email protected]>

* feat: Add PSID dataset (observational controls for Lalonde)

Signed-off-by: Michael Oberst <[email protected]>

* fix: Prevent fitting overlap rules if all samples in overlap region

One of the assertions in OverlapBooleanRule will trip if all samples are
in the overlap region.

This commit adds a more informative error if the assertion gets tripped,
and raises a more informative warning upstream if all samples are in the
overlap region

Signed-off-by: Michael Oberst <[email protected]>

* feat: Add function that can be used with target_units

`refute.filter_dataframe(df)` can be used to filter a dataframe to units
that are in the overlap/support region.

Signed-off-by: Michael Oberst <[email protected]>

* docs: Clarify notebook intro

Signed-off-by: Michael Oberst <[email protected]>

* feat: Clarify how to read rules in refuter output

Signed-off-by: Michael Oberst <[email protected]>

* fix: Return a copy when filtering

Signed-off-by: Michael Oberst <[email protected]>

* docs: Update notebook with Lalonde example

Signed-off-by: Michael Oberst <[email protected]>

* docs: Add return to docstring

Signed-off-by: Michael Oberst <[email protected]>

* docs: Add citation for pricing problem

Signed-off-by: Michael Oberst <[email protected]>

* feat: Change progressbar error to warning

Signed-off-by: Michael Oberst <[email protected]>

* chore: Update lockfile

Signed-off-by: Michael Oberst <[email protected]>

Signed-off-by: Michael Oberst <[email protected]>
  • Loading branch information
moberst committed Jan 18, 2023
1 parent c0de390 commit 3ab0e2a
Show file tree
Hide file tree
Showing 15 changed files with 2,892 additions and 17 deletions.
9 changes: 9 additions & 0 deletions docs/source/dowhy.causal_refuters.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,15 @@ dowhy.causal\_refuters.add\_unobserved\_common\_cause module
:undoc-members:
:show-inheritance:

dowhy.causal\_refuters.assess_overlap module
-----------------------------------------------------------

.. automodule:: dowhy.causal_refuters.assess_overlap
:members:
:undoc-members:
:show-inheritance:


dowhy.causal\_refuters.bootstrap\_refuter module
------------------------------------------------

Expand Down
880 changes: 880 additions & 0 deletions docs/source/example_notebooks/dowhy_refuter_assess_overlap.ipynb

Large diffs are not rendered by default.

126 changes: 126 additions & 0 deletions dowhy/causal_refuters/assess_overlap.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
import logging
import warnings
from typing import List, Optional

from dowhy.causal_refuter import CausalRefuter
from dowhy.causal_refuters.assess_overlap_overrule import OverlapConfig, OverruleAnalyzer, SupportConfig

logger = logging.getLogger(__name__)


class AssessOverlap(CausalRefuter):
"""Assess Overlap
This class implements the OverRule algorithm for assessing support and overlap via Boolean Rulesets, from [1].
[1] Oberst, M., Johansson, F., Wei, D., Gao, T., Brat, G., Sontag, D., & Varshney, K. (2020). Characterization of
Overlap in Observational Studies. In S. Chiappa & R. Calandra (Eds.), Proceedings of the Twenty Third International
Conference on Artificial Intelligence and Statistics (Vol. 108, pp. 788–798). PMLR. https://arxiv.org/abs/1907.04138
"""

def __init__(self, *args, **kwargs):
"""
Initialize the parameters required for the refuter.
Arguments are passed through to the `refute_estimate` method. See dowhy.causal_refuters.assess_overlap_overrule
for the definition of the `SupportConfig` and `OverlapConfig` dataclasses that define optimization
hyperparameters.
.. warning::
This method is only compatible with estimators that use backdoor adjustment, and will attempt to acquire
the set of backdoor variables via `self._target_estimand.get_backdoor_variables()`.
:param: cat_feats: List[str]: List of categorical features, all others will be discretized
:param: support_config: SupportConfig: DataClass with configuration options for learning support rules
:param: overlap_config: OverlapConfig: DataClass with configuration options for learning overlap rules
:param: overlap_eps: float: Defines the range of propensity scores for a point to be considered in the overlap
region, with the range defined as `(overlap_eps, 1 - overlap_eps)`, defaults to 0.1
:param: overrule_verbose: bool: Enable verbose logging of optimization output, defaults to False
:param: support_only: bool: Only fit rules to describe the support region (do not fit overlap rules), defaults to False
:param: overlap_only: bool: Only fit rules to describe the overlap region (do not fit support rules), defaults to False
"""
super().__init__(*args, **kwargs)
# TODO: Check that the target estimand has backdoor variables?
self._backdoor_vars = self._target_estimand.get_backdoor_variables()
self._cat_feats = kwargs.pop("cat_feats", [])
self._support_config = kwargs.pop("support_config", None)
self._overlap_config = kwargs.pop("overlap_config", None)
self._overlap_eps = kwargs.pop("overlap_eps", 0.1)
if self._overlap_eps < 0 or self._overlap_eps > 1:
raise ValueError(f"Value of `overlap_eps` must be in [0, 1], got {self._overlap_eps}")
self._support_only = kwargs.pop("support_only", False)
self._overlap_only = kwargs.pop("overlap_only", False)
self._overrule_verbose = kwargs.pop("overrule_verbose", False)

def refute_estimate(self, show_progress_bar=False):
"""
Learn overlap and support rules.
:param show_progress_bar: Not implemented, will raise error if set to True, defaults to False
:type show_progress_bar: bool
:raises NotImplementedError: Will raise this error if show_progress_bar=True
:returns: object of class OverruleAnalyzer
"""
if show_progress_bar:
warnings.warn("No progress bar is available for OverRule")

return assess_support_and_overlap_overrule(
data=self._data,
backdoor_vars=self._backdoor_vars,
treatment_name=self._treatment_name,
cat_feats=self._cat_feats,
overlap_config=self._overlap_config,
support_config=self._support_config,
overlap_eps=self._overlap_eps,
support_only=self._support_only,
overlap_only=self._overlap_only,
verbose=self._overrule_verbose,
)


def assess_support_and_overlap_overrule(
data,
backdoor_vars: List[str],
treatment_name: str,
cat_feats: List[str] = [],
overlap_config: Optional[OverlapConfig] = None,
support_config: Optional[SupportConfig] = None,
overlap_eps: float = 0.1,
support_only: bool = False,
overlap_only: bool = False,
verbose: bool = False,
):
"""
Learn support and overlap rules using OverRule.
:param data: Data containing backdoor variables and treatment name
:param backdoor_vars: List of backdoor variables. Support and overlap rules will only be learned with respect to
these variables
:type backdoor_vars: List[str]
:param treatment_name: Treatment name
:type treatment_name: str
:param cat_feats: Categorical features
:type cat_feats: List[str]
:param overlap_config: Configuration for learning overlap rules
:type overlap_config: OverlapConfig
:param support_config: Configuration for learning support rules
:type support_config: SupportConfig
:param: overlap_eps: float: Defines the range of propensity scores for a point to be considered in the overlap
region, with the range defined as `(overlap_eps, 1 - overlap_eps)`, defaults to 0.1
:param: support_only: bool: Only fit the support region
:param: overlap_only: bool: Only fit the overlap region
:param: verbose: bool: Enable verbose logging of optimization output, defaults to False
"""
analyzer = OverruleAnalyzer(
backdoor_vars=backdoor_vars,
treatment_name=treatment_name,
cat_feats=cat_feats,
overlap_config=overlap_config,
support_config=support_config,
overlap_eps=overlap_eps,
support_only=support_only,
overlap_only=overlap_only,
verbose=verbose,
)
analyzer.fit(data)
return analyzer
Loading

0 comments on commit 3ab0e2a

Please sign in to comment.