[New refutation] Add OverRule for learning Boolean rules to describe …

…support/overlap (#791) * feat: Add scaffolding for overrule, including basic test Signed-off-by: Michael Oberst <[email protected]> * feat: Update dependencies for overrule Signed-off-by: Michael Oberst <[email protected]> * feat: Add the full set of overrule code Signed-off-by: Michael Oberst <[email protected]> * style: Black styling on overrule code style: Black formatting on beam_search style: Black formatting for ruleset style: Black format utils style: Black formatting on load_process_data_BCS Signed-off-by: Michael Oberst <[email protected]> * fix: Change np.matmul to np.dot Signed-off-by: Michael Oberst <[email protected]> * fix: Update to appropriate matmul notation for latest CVXPY Signed-off-by: Michael Oberst <[email protected]> * feat: Remove unnecessary overrule code Signed-off-by: Michael Oberst <[email protected]> * feat: Minimum working example for OverRule Signed-off-by: Michael Oberst <[email protected]> * test: Minimum viable test for OverRule test: Fix test to work with new interface Signed-off-by: Michael Oberst <[email protected]> * feat: Print rules with option to recompute metrics Signed-off-by: Michael Oberst <[email protected]> * feat: Improve printing of results for refutation Signed-off-by: Michael Oberst <[email protected]> * feat: Pass in additional arguments to OverRule Signed-off-by: Michael Oberst <[email protected]> * docs: Add docstrings to ruleset.py docs: Docstrings for assess_overlap.py Signed-off-by: Michael Oberst <[email protected]> * feat: Update logger Signed-off-by: Michael Oberst <[email protected]> * docs: Add additional docstrings Signed-off-by: Michael Oberst <[email protected]> * fix: Path bug Signed-off-by: Michael Oberst <[email protected]> * feat: Add notebook with a toy example to demonstrate OverRule Signed-off-by: Michael Oberst <[email protected]> * feat: Add back using LP coeff by default Signed-off-by: Michael Oberst <[email protected]> * docs: Typing and docstrings docs: Consistent module docstrings docs: Add docstrings to overrule/utils.py docs: Add docstrings to overrule/BCS/beam_search.py docs: Add doctstrings and typing to load_process_data_BCS docs: Add docstrings and type hints to BCS/overlap_boolean_rule.py Signed-off-by: Michael Oberst <[email protected]> * docs: Fix and rename notebook for demonstrating overrule Signed-off-by: Michael Oberst <[email protected]> * feat: Use default_rng instead of setting a global seed in sample_Unif Signed-off-by: Michael Oberst <[email protected]> * fix: Use rng in place of numpy.random Signed-off-by: Michael Oberst <[email protected]> * fix: Replace list with numpy array to fix type error Signed-off-by: Michael Oberst <[email protected]> * docs: Fix type hint on ref_range Signed-off-by: Michael Oberst <[email protected]> * feat: Add option to only fit overlap or support rules Signed-off-by: Michael Oberst <[email protected]> * docs: Flesh out example notebook with parameters Signed-off-by: Michael Oberst <[email protected]> * feat: Add thresh_override as a argument for configuration Signed-off-by: Michael Oberst <[email protected]> * feat: Functional API for overrule has defaults Signed-off-by: Michael Oberst <[email protected]> * docs: Add API reference Signed-off-by: Michael Oberst <[email protected]> * ci: Fix support rule test Signed-off-by: Michael Oberst <[email protected]> * chore: Update poetry.lock for cxvpy dependency Signed-off-by: Michael Oberst <[email protected]> * feat: Remove `XGBClassifier` as default classifier To avoid dependency on `xgboost`, replace `XGBClassifier` as the default propensity score model with `RandomForestClassifier` from `sklearn` Signed-off-by: Michael Oberst <[email protected]> * fix: Fix logic so that when verbose=True, silent=False Signed-off-by: Michael Oberst <[email protected]> * feat: Remove seaborn dependency from overrule notebook Signed-off-by: Michael Oberst <[email protected]> * feat: Add option to pass random seed to support estimation Signed-off-by: Michael Oberst <[email protected]> * docs: Update notebook to use random seed on support estimation Signed-off-by: Michael Oberst <[email protected]> * fix: Typo Signed-off-by: Michael Oberst <[email protected]> * feat: Add PSID dataset (observational controls for Lalonde) Signed-off-by: Michael Oberst <[email protected]> * fix: Prevent fitting overlap rules if all samples in overlap region One of the assertions in OverlapBooleanRule will trip if all samples are in the overlap region. This commit adds a more informative error if the assertion gets tripped, and raises a more informative warning upstream if all samples are in the overlap region Signed-off-by: Michael Oberst <[email protected]> * feat: Add function that can be used with target_units `refute.filter_dataframe(df)` can be used to filter a dataframe to units that are in the overlap/support region. Signed-off-by: Michael Oberst <[email protected]> * docs: Clarify notebook intro Signed-off-by: Michael Oberst <[email protected]> * feat: Clarify how to read rules in refuter output Signed-off-by: Michael Oberst <[email protected]> * fix: Return a copy when filtering Signed-off-by: Michael Oberst <[email protected]> * docs: Update notebook with Lalonde example Signed-off-by: Michael Oberst <[email protected]> * docs: Add return to docstring Signed-off-by: Michael Oberst <[email protected]> * docs: Add citation for pricing problem Signed-off-by: Michael Oberst <[email protected]> * feat: Change progressbar error to warning Signed-off-by: Michael Oberst <[email protected]> * chore: Update lockfile Signed-off-by: Michael Oberst <[email protected]> Signed-off-by: Michael Oberst <[email protected]>
py-why · Jan 18, 2023 · 3ab0e2a · 3ab0e2a
1 parent c0de390
commit 3ab0e2a
Show file tree

Hide file tree

Showing 15 changed files with 2,892 additions and 17 deletions.
diff --git a/docs/source/dowhy.causal_refuters.rst b/docs/source/dowhy.causal_refuters.rst
@@ -12,6 +12,15 @@ dowhy.causal\_refuters.add\_unobserved\_common\_cause module
  :undoc-members:
  :show-inheritance:
 
+dowhy.causal\_refuters.assess_overlap module
+-----------------------------------------------------------
+
+.. automodule:: dowhy.causal_refuters.assess_overlap
+ :members:
+ :undoc-members:
+ :show-inheritance:
+
+
 dowhy.causal\_refuters.bootstrap\_refuter module
 ------------------------------------------------
 

diff --git a/docs/source/example_notebooks/dowhy_refuter_assess_overlap.ipynb b/docs/source/example_notebooks/dowhy_refuter_assess_overlap.ipynb
diff --git a/dowhy/causal_refuters/assess_overlap.py b/dowhy/causal_refuters/assess_overlap.py
@@ -0,0 +1,126 @@
+import logging
+import warnings
+from typing import List, Optional
+
+from dowhy.causal_refuter import CausalRefuter
+from dowhy.causal_refuters.assess_overlap_overrule import OverlapConfig, OverruleAnalyzer, SupportConfig
+
+logger = logging.getLogger(__name__)
+
+
+class AssessOverlap(CausalRefuter):
+ """Assess Overlap
+
+ This class implements the OverRule algorithm for assessing support and overlap via Boolean Rulesets, from [1].
+
+ [1] Oberst, M., Johansson, F., Wei, D., Gao, T., Brat, G., Sontag, D., & Varshney, K. (2020). Characterization of
+ Overlap in Observational Studies. In S. Chiappa & R. Calandra (Eds.), Proceedings of the Twenty Third International
+ Conference on Artificial Intelligence and Statistics (Vol. 108, pp. 788–798). PMLR. https://arxiv.org/abs/1907.04138
+ """
+
+ def __init__(self, *args, **kwargs):
+ """
+ Initialize the parameters required for the refuter.
+
+ Arguments are passed through to the `refute_estimate` method. See dowhy.causal_refuters.assess_overlap_overrule
+ for the definition of the `SupportConfig` and `OverlapConfig` dataclasses that define optimization
+ hyperparameters.
+
+ .. warning::
+ This method is only compatible with estimators that use backdoor adjustment, and will attempt to acquire
+ the set of backdoor variables via `self._target_estimand.get_backdoor_variables()`.
+
+ :param: cat_feats: List[str]: List of categorical features, all others will be discretized
+ :param: support_config: SupportConfig: DataClass with configuration options for learning support rules
+ :param: overlap_config: OverlapConfig: DataClass with configuration options for learning overlap rules
+ :param: overlap_eps: float: Defines the range of propensity scores for a point to be considered in the overlap
+ region, with the range defined as `(overlap_eps, 1 - overlap_eps)`, defaults to 0.1
+ :param: overrule_verbose: bool: Enable verbose logging of optimization output, defaults to False
+ :param: support_only: bool: Only fit rules to describe the support region (do not fit overlap rules), defaults to False
+ :param: overlap_only: bool: Only fit rules to describe the overlap region (do not fit support rules), defaults to False
+ """
+ super().__init__(*args, **kwargs)
+ # TODO: Check that the target estimand has backdoor variables?
+ self._backdoor_vars = self._target_estimand.get_backdoor_variables()
+ self._cat_feats = kwargs.pop("cat_feats", [])
+ self._support_config = kwargs.pop("support_config", None)
+ self._overlap_config = kwargs.pop("overlap_config", None)
+ self._overlap_eps = kwargs.pop("overlap_eps", 0.1)
+ if self._overlap_eps < 0 or self._overlap_eps > 1:
+ raise ValueError(f"Value of `overlap_eps` must be in [0, 1], got {self._overlap_eps}")
+ self._support_only = kwargs.pop("support_only", False)
+ self._overlap_only = kwargs.pop("overlap_only", False)
+ self._overrule_verbose = kwargs.pop("overrule_verbose", False)
+
+ def refute_estimate(self, show_progress_bar=False):
+ """
+ Learn overlap and support rules.
+
+ :param show_progress_bar: Not implemented, will raise error if set to True, defaults to False
+ :type show_progress_bar: bool
+ :raises NotImplementedError: Will raise this error if show_progress_bar=True
+ :returns: object of class OverruleAnalyzer
+ """
+ if show_progress_bar:
+ warnings.warn("No progress bar is available for OverRule")
+
+ return assess_support_and_overlap_overrule(
+ data=self._data,
+ backdoor_vars=self._backdoor_vars,
+ treatment_name=self._treatment_name,
+ cat_feats=self._cat_feats,
+ overlap_config=self._overlap_config,
+ support_config=self._support_config,
+ overlap_eps=self._overlap_eps,
+ support_only=self._support_only,
+ overlap_only=self._overlap_only,
+ verbose=self._overrule_verbose,
+ )
+
+
+def assess_support_and_overlap_overrule(
+ data,
+ backdoor_vars: List[str],
+ treatment_name: str,
+ cat_feats: List[str] = [],
+ overlap_config: Optional[OverlapConfig] = None,
+ support_config: Optional[SupportConfig] = None,
+ overlap_eps: float = 0.1,
+ support_only: bool = False,
+ overlap_only: bool = False,
+ verbose: bool = False,
+):
+ """
+ Learn support and overlap rules using OverRule.
+
+ :param data: Data containing backdoor variables and treatment name
+ :param backdoor_vars: List of backdoor variables. Support and overlap rules will only be learned with respect to
+ these variables
+ :type backdoor_vars: List[str]
+ :param treatment_name: Treatment name
+ :type treatment_name: str
+ :param cat_feats: Categorical features
+ :type cat_feats: List[str]
+ :param overlap_config: Configuration for learning overlap rules
+ :type overlap_config: OverlapConfig
+ :param support_config: Configuration for learning support rules
+ :type support_config: SupportConfig
+ :param: overlap_eps: float: Defines the range of propensity scores for a point to be considered in the overlap
+ region, with the range defined as `(overlap_eps, 1 - overlap_eps)`, defaults to 0.1
+ :param: support_only: bool: Only fit the support region
+ :param: overlap_only: bool: Only fit the overlap region
+ :param: verbose: bool: Enable verbose logging of optimization output, defaults to False
+ """
+ analyzer = OverruleAnalyzer(
+ backdoor_vars=backdoor_vars,
+ treatment_name=treatment_name,
+ cat_feats=cat_feats,
+ overlap_config=overlap_config,
+ support_config=support_config,
+ overlap_eps=overlap_eps,
+ support_only=support_only,
+ overlap_only=overlap_only,
+ verbose=verbose,
+ )
+ analyzer.fit(data)
+ return analyzer