Skip to content
Patrick Blöbaum edited this page Jun 6, 2022 · 19 revisions

This document outlines the roadmap for DoWhy as a base library for different causal tasks. While DoWhy has started out as a causal effect estimation library, the associated DoWhy capabilities of manipulating causal graphs, identification and refutation apply to a wide variety of causal tasks including prediction, counterfactual estimation, and attribution. Below we outline future directions for DoWhy that defines the planned core capabilities and show how combinations of the core capabilities can provide “recipes” for different causal tasks.

This is a work-in-progress. We would love to have contributions and feedback from you to shape DoWhy.

  • If you would like to contribute to any of these features, use the corresponding project page and mark yourself, or raise an issue if that's easier.
  • If you would like to suggest any new features or changes, please start a discussion.

Guiding Principle: What functionality should be in DoWhy

Causal Tasks

A guiding principle is that if it is a fundamental causal task that will be useful to many downstream applications, then it belongs in DoWhy (e.g., effect estimation, attribution, causal prediction, counterfactual estimation). If it is a task that makes sense only in specific situations (e.g., optimizing a pipeline system), then it makes sense to use DoWhy as a base library but place the resulting functionality outside of DoWhy. Currently there are four high-level tasks we are planning for DoWhy:

  1. Effect estimation (“Forward” causal inference)
  2. Attribution (“Reverse” causal inference)
  3. Prediction
  4. Counterfactual estimation

As an example of a task outside the scope of DoWhy (but where DoWhy can be used as a base library), consider model explanation and evaluation of bias in a model. Effect estimation or attribution in DoWhy can be repurposed to build a solution for model explanation or fairness, but those exist better in a separate library focused on ML responsibility.

Causal Functionalities

In addition, DoWhy should include certain core functionalities that enable the four tasks above. We tag each task by their status: Implemented, short-term goal, long-term goal. That said, all functionalities can benefit from more community contributions to make them more useful and robust. For example, while the refutation functionality is implemented for average causal effect, we'd like to extend it to conditional ATE and other tasks such as prediction and attribution, as a long-term goal.

  1. Causal discovery (learning a graph) [Short-term]
  2. Building an SCM (fitting edges of the graph) [Implemented, Pre-release]
  3. Causal identification (checking if a desired quantity is estimable from data) [Implemented]
  4. Refutations (validate causal assumptions) [Implemented][Long-term]
  5. Expressing causal assumptions in a user-friendly way (beyond graphs) [Long-term]
  6. Causal representation learning [Short-term]
  7. Generating interventional distribution (joint or targeted) [Implemented, Pre-release][Short-term]

By combining these functionalities, different causal tasks can be performed.

Other guiding principles for DoWhy

DoWhy aims to provide:

  1. An easy, user-friendly API for common causal estimation tasks, e.g. effect estimation, prediction, root cause analysis, or counterfactuals
  2. Components/methods that can be easily repurposed for new user-defined tasks or advanced scenarios
  3. Task-independent functionality that can be exposed to external libraries that can use them downstream to solve new problems (e.g., using the causal model for active exploration or planning)
  4. An API that is independent of any specific implementation, so that different (even external) methods can be seamlessly integrated or intermixed in a single analysis
  5. An API that makes input signatures of each method explicit and makes it hard to mix incompatible methods

Proposed API Changes

As DoWhy expands to multiple tasks, a functional API (rather than an object-oriented one) can be more convenient and avoid state-keeping in the main codebase. API proposal for v1 discusses these proposed changes. We've started a discussion page here. Eager to hear your feedback.

Current projects

Causal discovery (see Project)

  • Integrate causal discovery algorithms with DoWhy's learn_graph API
  • Allow users to edit the graph in a seamless way

Identification methods (see Project)

  • Implement more advanced general identification methods, especially in presence of unobserved variables
  • Implement specialized identification strategies for specific graph structures

Estimation methods (see Project)

  • Better interpretation for current estimation methods
  • General estimation for estimands beyond adjustment sets (e.g., estimating output of the ID* algorithm)

Refutation methods (see Project)

  • Refutation methods based on synthetic data generation
  • Non-parametric sensitivity analysis, especially for non-linear CATE estimators
  • Prognostic scores as a balance check and other visualizations
  • Bayesian model criticism

Prediction methods

  • Common API for out-of-distribution (OoD) prediction, including domain generalization and subgroup generalization
  • Implementation of causal methods for OoD generalization