Skip to content

Networkx Proposals For MixedEdgeGraph Class To Enable Causal Graphs

Adam Li edited this page Jul 26, 2022 · 2 revisions

Networkx is the standard Python package for representing graphs, performing operations on graphs and performing algorithms on graphs. It also enables users to draw graphs. Networkx graphs only have one type of edge, either directed, or undirected.

Causality uses graphs with different types of edges in the same graph. For example:

  • ADMG: a causal DAG with directed and bidirected edges.
  • CPDAG: an equivalence class of causal graphs with directed and undirected edges.
  • PAG: an equivalence class of causal graphs with directed edges, undirected edges and directed edges with circular endpoints.

MixedEdge Graph Class Extension

We propose a MixedEdge graph class that allows arbitrary combinations of existing Graph or DiGraph classes. This could of course be extended to support MultiMixedEdge graph class in the future as well.

class MixedEdgeGraph:
    def __init__(self, graphs: List, edge_types: List):
        # do some error checking
        ...

        self.graphs = graphs
        self.edge_types = edge_types

    def add_edge(self, u, v, edge_type):
        if edge_type not in self.edge_types:
            raise Error

        edge_idx = edge_types.index(edge_type)
        self.graphs[edge_idx].add_edge(u, v)

    def has_edge(self, u, v, edge_type):
    
    def remove_edge(self, u, v, edge_type):

    def to_undirected(self):
        # convert all graphs to undirected

    @cached_property
    def adj(self, edge_type='all'):
        # return Adjacencies over all/any type of edge

This would serve as a "new" base class since it would not inherit from any of the existing networkx graphs, but rather serve as a generic API layer for allowing any arbitrary combination of graphs that represent different edge types, while providing users with highly efficient and robust data structures: e.g. AdjacencyView. Here are a few examples of usage in downstream packages. These explicit implementations of causal graphs ideally should be lightweight and extend the networkx-API layer to provide more nuanced functionality for causality users.

Examples of Causality-Specific Graph Classes (Not to be added to networkx at the moment)

Although all graph classes could be represented using just a MixedEdgeGraph instantiation, it is not desirable. For one, having a subclass implementation of common causal graphs would enable i) error checking of edge operations and graph initialization, ii) light-weight convenience API for common causal-graph operations (i.e. access to specific types of edges and causality-specific notions such as "c-components"), and iii) provide additional causality-specific graph operations (i.e. do-operation).

class ADMG(MixedEdgeGraph):
    def __init__(self, directed_data, bidirected_data):
        super().__init__([directed_data, bidirected_data], edge_types=['directed', 'bidirected'])

    def do(self, u):
        # apply do intervention and intervene on the graph

    def soft_do(self, nodes):
        # apply soft intervention on a set of nodes by adding a F-node for example
    
    @property
    def c_components(self):
        return nx.connected_components(self.bidirected_edge_graph)

    ...

We could also subclass the MixedEdgeGraph with equivalence classes, where we have again different types of edges.

class CPDAG(MixedEdgeGraph):
    def __init__(self, directed_data, undirected_data):
        super().__init__([directed_data, undirected_data], edge_types=['directed', 'undirected'])

    # API layer for users to work with the specific types of edges
    def add_undirected_edge(self, u, v):
        super().add_edge(u, v, edge_type='undirected')
    
    ...

# We can also use the ADMG defined functions in the PAG as well
class PAG(MixedEdgeGraph, ADMG):
    def __init__(self, directed_data, undirected_data, circular_data):
        super().__init__([directed_data, undirected_data, circular_data], edge_types=['directed', 'undirected', 'circle'])

    # API layer for users to work with the specific types of edges
    def add_circle_endpoint(self, u, v):
        super().add_edge(u, v, edge_type='circle')

    def potential_parents(self, u):
        # return the directed edge parents of u and circle edges pointing to u
    
    ...

# instantiate the PAG
pag = PAG(nx.DiGraph, nx.Graph, nx.DiGraph)

MixedEdge Graph Algorithms

MixedEdge graphs are useful mainly in causality, so we propose similar to bipartite submodule in networkx to add a causal submodule that has graph traversal algorithms for mixededge graphs. The most relevant one would be that of "m-separation", which generalizes "d-separation". Similar to bipartite in networkx, we can also have some "convention" that is used in networkx to represent the different types of edges we want to analyze in each algorithm. In this case, networkx has the convention that there is an edge_type named 'directed' and an edge type named 'bidirected' when running m_separated.

def m_separated(G, x, y, z, directed_edge_name='directed', bidirected_edge_name='bidirected'):
    # check that G is a mixed edge graph with the edge-graph names corresponding to
    # 'directed' and 'bidirected'.
    check_G(G)

    # convert G to a DAG by converting all bidirected edges to unobserved confounders
    dag_G = _convert_latent_to_unobserved_confounders(G)

    # run d-separation
    return d_separated(dag_G, x, y, z)

def _convert_latent_to_unobserved_confounders(G: ADMG) -> ADMG:
    """Convert all bidirected edges to unobserved confounders.

    Parameters
    ----------
    G : ADMG
        A causal graph with bidirected edges.

    Returns
    -------
    G_copy : ADMG
        A networkx DiGraph that is a fully specified DAG with unobserved
        variables added in place of bidirected edges.
    """
    uc_label = "Unobserved Confounders"
    G_copy = G.copy()

    # for every bidirected edge, add a new node
    for idx, latent_edge in enumerate(G.c_component_graph.edges):
        G_copy.add_node(f"U{idx}", label=uc_label, observed="no")

        # then add edges from the new UC to the nodes
        G_copy.add_edge(f"U{idx}", latent_edge[0])
        G_copy.add_edge(f"U{idx}", latent_edge[1])

        # remove the actual bidirected edge
        G_copy.remove_bidirected_edge(*latent_edge)

    return G_copy

Other algorithms also can be proposed, such as discriminating paths, uncovered paths, etc. These are traversal algorithms that operate over specifically mixed edges.

def discriminating_path(G, directed_edge_name='directed', bidirected_edge_name='bidirected'):
    # compute discriminating paths in the graph

def is_discriminating_path(G, path):
    # check if the path is a discriminating path

def has_discriminating_path(G, u, v, w):
    # check if there is a discriminating path along u, v, w

Note this is similar to other path algorithms, where networkx provides a function to i) enumerate all paths of a certain type, ii) check if there is a path, iii) check if a path exhibits certain properties. (e.g. https://networkx.org/documentation/stable/reference/algorithms/shortest_paths.html)

Similar to bipartite in networkx, we could have common basic functions to help causality-focused users:

Networkx Considerations

With the addition of a new graph class, existing algorithms inside networkx should be modified to either i) raise an error if it doesn't support mixedEdgeGraphs, or ii) work with mixed edge graphs or iii) some documentation for how to work it with the existing codebase.

Update as of 07-25-22: I am inclined to say that the documentation for MixedEdgeGraph should state that any algorithm within https://networkx.org/documentation/stable/reference/algorithms/approximation.html#module-networkx.algorithms is not guaranteed to work with MixedEdgeGraph unless specified. Perhaps @dshultz from networkx can comment on this. By making this cutoff of what "will" work with MixedEdgeGraph, we guarantee that users do not accidentally run something that doesn't work. Then we can slowly PR algorithms that make sense to add MixedEdgeGraph functionality.

Errors

TBD

Refactoring Needed

TBD

Documentation

  • topological_sort: document that you can run it using topological_sort(mixed_graph.get_edge_graph('directed')).
  • is_directed_acyclic_graph: document that you can run it using is_directed_acyclic_graph(mixed_graph.get_edge_graph('directed')).

What belongs in networkx vs pywhy?

Fundamental graphical algorithms, such as m-separation should go into networkx. These can possibly have generic usage for researchers who use mixed-edge graphs that are not necessarily "causal". Keeping in line with the fact that networkx provides generic path-algorithms for graphs, we believe that the causal submodule can house many of the fundamental causal graph traversal algorithms. These include fundamental path algorithms for mixed-edge graphs, such as discriminating paths, possibly d-separating paths, uncovered potentially directed paths, potentially directed paths, etc.

This will need discussion with networkx. The exact separation of what goes where should be discussed.

PyWhy on the other hand would house more nuanced causal operations, that align more with causal-inference semantics. For example, structure learning.

Discussion Page

See here for discussions on the proposal outlined here: https://github.com/py-why/dowhy/discussions/525

References

[1] https://github.com/networkx/networkx/discussions/5811