Practical Guide for Causal Pathways and Sub-group Disparity Analysis

Farnaz Kohankhaki1, Shaina Raza1, Oluwanifemi Bamgbose1,
Deval Pandya1, Elham Dolatabadi1,2,*,
Abstract

In this study, we introduce the application of causal disparity analysis to unveil intricate relationships and causal pathways between sensitive attributes and the targeted outcomes within real-world observational data. Our methodology involves employing causal decomposition analysis to quantify and examine the causal interplay between sensitive attributes and outcomes. We also emphasize the significance of integrating heterogeneity assessment in causal disparity analysis to gain deeper insights into the impact of sensitive attributes within specific sub-groups on outcomes. Our two-step investigation focuses on datasets where race serves as the sensitive attribute. The results on two datasets indicate the benefit of leveraging causal analysis and heterogeneity assessment not only for quantifying biases in the data but also for disentangling their influences on outcomes. We demonstrate that the sub-groups identified by our approach to be affected the most by disparities are the ones with the largest ML classification errors. We also show that grouping the data only based on a sensitive attribute is not enough, and through these analyses, we can find sub-groups that are directly affected by disparities. We hope that our findings will encourage the adoption of such methodologies in future ethical AI practices and bias audits, fostering a more equitable and fair technological landscape.

Introduction

Fairness in data science and machine learning (ML) is indispensable for the responsible development and deployment of ethical artificial intelligence (AI) technologies (Díaz-Rodríguez et al. 2023). Key tools in data science, including Aequitas (Saleiro et al. 2018), AI Fairness 360 (Bellamy et al. 2018), and Fairlearn (Bird et al. 2020) play a pivotal role in addressing fairness challenges in ML models, focusing on concepts such as demographic parity and equalizing statistics across sensitive attribute groups (Saha et al. 2020; Feldman et al. 2015; Zhang and Bareinboim 2018a; Barocas, Hardt, and Narayanan 2023a; Raza et al. 2024). However, these approaches can lead to fairness gerrymandering, where broad fairness across high-level groups masks unfair treatment within sub-groups (Kearns et al. 2018). Sub-group fairness approaches (Yang, Cisse, and Koyejo 2020; Shui et al. 2022) have emerged to address this, aiming to reconcile group and individual fairness notions (Pfohl et al. 2023).

Furthermore, understanding and quantifying the extent to which the observed disparity in outcomes, such as those seen with demographic parity, is attributed to the causal influence of sensitive attributes is crucial in fields, including health and social sciences (Mehrabi et al. 2021; Barocas, Hardt, and Narayanan 2023b; Braveman et al. 2011; Glymour and Hamad 2018). Causality-based fairness frameworks view disparity as the causal effect of sensitive attributes S𝑆Sitalic_S on outcomes Y𝑌Yitalic_Y, raising fundamental questions about how changes in these attributes affect average outcomes (Kearns et al. 2018). These methodologies revolve around a central question: if the sensitive attribute S𝑆Sitalic_S changed (e.g., changing from marginalized group s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to non-marginalized group s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT), how would the outcome Y𝑌Yitalic_Y change on average?

Two prominent causal frameworks, the structural causal model (SCMs) (Wu et al. 2019) and the potential outcome framework (Khademi et al. 2019), have been utilized for causal fairness analysis and more particularly to quantify the disparity (MacKinnon, Fairchild, and Fritz 2007; Pearl 2014). SCMs assume that we have full knowledge of the causal graph, enabling us to decompose the causal effect of any variable into different paths, such as direct and indirect effects. On the other hand, the potential outcome framework (Rubin 2005) does not assume the availability of the causal graph and instead focuses on estimating the causal effects of treatment variables. However, a common challenge across all causal models is identifiability, referring to whether they can be uniquely measured from observational data (Morgan and Winship 2015). This poses a critical barrier to applying these notions to real-world scenarios.

Randomized experiments, considered the gold standard for inferring causal relationships in statistics, are often not feasible or cost-effective in the context of disparity analysis (Hariton and Locascio 2018). Therefore, in most cases, the causal relationship must be inferred from observational data rather than controlled experiments. This limitation has spurred a stream of research aiming to address these challenges and develop more practical and effective methodologies for causal fairness analysis. Early literature in the SCM primarily utilized linear and parametric methods, limiting its capacity to offer a universal approach for analyzing natural and social phenomena characterized by non-linearities and interactions (MacKinnon, Fairchild, and Fritz 2007). Later, Pearl introduced the causal mediation formula designed for arbitrary non-parametric models, serving as a valuable tool for decomposing total effects (Pearl 2014). Subsequently, a substantial body of literature emerged, focusing on causal effect decomposition under the rubric of mediation analysis and proposing various optimization problems to adapt the causal framework for fairness analysis (Zhang and Bareinboim 2018b; Wu et al. 2019; Zhang, Wu, and Wu 2016). One notable framework (Plecko and Bareinboim 2024) addresses spurious effects in the decomposition of causal effects and explores the relationships between causal and spurious effects with demographic parity, offering practical insights for data science and fairness considerations.

In the realm of fairness through causal analysis research, a significant focus lies on sub-group analysis and heterogeneity, approached from two perspectives: one being heterogeneous treatment effects (Wager and Athey 2018), which directly aligns with our study, and the other involving ’counterfactually fair’ algorithms for individuals, a topic not directly relevant to our current research (Kearns et al. 2019; Kavouras et al. 2024). The former involves systematically quantifying variations in the causal impact of sensitive attributes on the outcome of interest across sub-groups (Pearl 2022). Approaches for estimating heterogeneous causal effects encompass classical non-parametric methods such as nearest-neighbour matching, kernel methods, and series estimation, demonstrating efficacy in scenarios with a limited number of covariates (Crump et al. 2008; Lee 2009; Willke et al. 2012). More recently, data-driven ML algorithms including causal forest which can be adept at handling numerous moderating variables have shown promising results in heterogeneity analysis (Wager and Athey 2018).

Building on the urgency of adopting causal reasoning techniques in fairness analysis, the main aim of this study is to leverage causal analysis for sub-group disparity assessment. First, we demonstrate the application of causal disparity analysis to uncover the intricate relationships and causal pathways between sensitive attributes and the outcome of interest in real-world observational data. Then, we close the loop by employing causal disparity analysis for sub-group fairness within the context of ML, showcasing how a causal-aware approach can enhance sub-group fairness evaluation. Our overarching goal is to pave the way for conducting disparity audits that lay the foundation for ethical and equitable ML. The novelty of this study lies not in the specific methodologies used but in recognizing causal reasoning as a novel technique for conceptualizing and quantifying disparity, making it suitable for promoting fairness in data science.

  • We demonstrate the application of causal disparity analysis to quantify and decompose causal pathways between sensitive attributes and the targeted outcomes within two real-world observational data. We successfully indicate the capability of our approach to uncover hidden disparities, even in cases where observed disparities are nearly zero.

  • We pioneer a novel sub-group discovery method rooted in the concept of Heterogeneity of Treatment Effect, enabling the identification of variations in the magnitude and direction of decomposed causal effects among individuals.

  • We evaluate the efficacy and utility of our proposed causal disparity analysis in a fairness ML experiment. Our method demonstrates its ability to identify biased performance within each sub-group of individuals, particularly those identified quantitatively as most affected by disparities.

Materials and Methods

In this section, we will introduce causal disparity analysis through the lens of counterfactual inference and non-parametric SCM proposed by Pearl (Pearl 2014) and expanded by Zhang et al. (Zhang and Bareinboim 2018b). Following this approach, various causal effects can be defined as the difference between two counterfactual outcomes (Holland 1986; Rubin 1974) along the causal pathway from sensitive attributes (causes) to outcomes. We will elucidate how these effects can be quantitatively measured and estimated from data through the experiments.

Preliminaries

Our study is based on a basic causal structure which consists of four random variables (Y,S,X,M)𝑌𝑆𝑋𝑀(Y,S,X,M)( italic_Y , italic_S , italic_X , italic_M ) sampled from unknown distribution; S𝑆Sitalic_S represents the random variable for the sensitive attribute (whose effect we seek to measure). Y𝑌Yitalic_Y represents the random variable for the outcome of interest. X𝑋Xitalic_X represents the random variable for all in-sensitive attributes, including observed confounders, denoted by C𝐶Citalic_C, and mediators, denoted by M𝑀Mitalic_M. The lowercase (y,s,x,m)𝑦𝑠𝑥𝑚(y,s,x,m)( italic_y , italic_s , italic_x , italic_m ) represents the values that variables may take. As a running example, S𝑆Sitalic_S stands for race, M𝑀Mitalic_M stands for the job title, and Y𝑌Yitalic_Y stands for income amount. Here we consider two potential outcomes Ys1subscript𝑌𝑠1Y_{s1}italic_Y start_POSTSUBSCRIPT italic_s 1 end_POSTSUBSCRIPT and Ys2subscript𝑌𝑠2Y_{s2}italic_Y start_POSTSUBSCRIPT italic_s 2 end_POSTSUBSCRIPT for sensitive attributes, S=s1,s2𝑆subscript𝑠1subscript𝑠2S={s_{1},s_{2}}italic_S = italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. E[Ys,m]𝐸subscript𝑌𝑠𝑚E[Y_{s},m]italic_E [ italic_Y start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT , italic_m ] stands for E[Y|do(S=s,M=m)]𝐸delimited-[]conditional𝑌𝑑𝑜formulae-sequence𝑆𝑠𝑀𝑚E[Y|do(S=s,M=m)]italic_E [ italic_Y | italic_d italic_o ( italic_S = italic_s , italic_M = italic_m ) ] which is interpreted as the expectation of potential outcome Y𝑌Yitalic_Y when the sensitive attribute S𝑆Sitalic_S is set to s𝑠sitalic_s and the mediator variable M𝑀Mitalic_M is set to m𝑚mitalic_m.

Sensitive attributes, S𝑆Sitalic_S, that serve as the basis for disparity encompass a range of personal characteristics that have historically been unfairly targeted to differentiate individuals (Mehrabi et al. 2021). These attributes are pivotal in discussions surrounding equity, inclusion, and human rights and are commonly discussed in anti-discrimination laws (Nations 2023), regulations, and human rights frameworks around the globe. Among these attributes, a notable array includes race, nationality, ethnic origin, colour, religion, age, sex, sexual orientation, gender identity or expression, marital status, family status, genetic characteristics, or disability (Verma and Rubin 2018).

In this study, the term sensitive category denotes individuals grouped based on their sensitive attributes. The term sub-group refers to individuals grouped according to the quantity of their estimated causal effects.

Causal Disparity Analysis

Within the context of counterfactual fairness, the causal effect is characterized as the difference between two potential (also called counterfactual) outcomes: one outcome, Ys1subscript𝑌subscript𝑠1Y_{s_{1}}italic_Y start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, if the sensitive attribute is s1𝑠1s1italic_s 1 (for instance, if the individual is female), and the other outcome, Ys2subscript𝑌subscript𝑠2Y_{s_{2}}italic_Y start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT for s2𝑠2s2italic_s 2 (in this case, if the individual is not female). Due to the presence of the mediator, the potential outcomes are not only dependent on sensitive attributes but also on mediator values [4]. This way the causal effect can be decomposed into effects such as counterfactual causal effect, counterfactual indirect effect and spurious effect. The counterfactual measures of direct and indirect effects, are conditional versions of the natural direct and indirect effect introduced by Pearl (Pearl 2014) and are widely popular throughout the empirical sciences. Here we define the causal and non-causal fairness criteria we have used in this study;

Total Variation (TV) also known as demographic parity represents the statistical distinction in the conditional distribution of the outcome between two groups when simply observing that S=s1𝑆subscript𝑠1S=s_{1}italic_S = italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, compared to S=s2𝑆subscript𝑠2S=s_{2}italic_S = italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT:

TV(Y)=P(Y|S=s2)P(Y|S=s1)𝑇𝑉𝑌𝑃conditional𝑌𝑆subscript𝑠2𝑃conditional𝑌𝑆subscript𝑠1TV(Y)=P(Y|S=s_{2})-P(Y|S=s_{1})italic_T italic_V ( italic_Y ) = italic_P ( italic_Y | italic_S = italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ) - italic_P ( italic_Y | italic_S = italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ) (1)

The counterfactual direct effect (ctf-DE) is the average difference between two potential outcomes when the sensitive attribute transitions from S=s1𝑆subscript𝑠1S=s_{1}italic_S = italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (female) to S=s2𝑆subscript𝑠2S=s_{2}italic_S = italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (not female), while the mediator is set to whatever value it would have naturally attained prior to the change in S=s1𝑆subscript𝑠1S=s_{1}italic_S = italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, for a specific sub-group of the population, s𝑠sitalic_s.

ctfDEs1,s2(Y|s)=E[Ys2,Ms1Ys1,Ms1|s]𝑐𝑡𝑓𝐷subscript𝐸subscript𝑠1subscript𝑠2conditional𝑌𝑠𝐸subscript𝑌subscript𝑠2subscript𝑀subscript𝑠1subscript𝑌subscript𝑠1conditionalsubscript𝑀subscript𝑠1𝑠ctf-DE_{s_{1},s_{2}}(Y|s)=E[Y_{s_{2}},M_{s_{1}}-Y_{s_{1}},M_{s_{1}}|s]italic_c italic_t italic_f - italic_D italic_E start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_Y | italic_s ) = italic_E [ italic_Y start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT | italic_s ] (2)

The counterfactual indirect Effect (ctf-IE) is the average difference between two potential outcomes when the sensitive attribute remains constant at s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT (female), while the mediator changes from its values under s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to whatever value it would have attained for each individual under s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT (not female), for a specific sub-group of the population, s𝑠sitalic_s.

ctfIEs1,s2(Y|s)=E[Ys1,Ms2Ys1,Ms1|s]𝑐𝑡𝑓𝐼subscript𝐸subscript𝑠1subscript𝑠2conditional𝑌𝑠𝐸subscript𝑌subscript𝑠1subscript𝑀subscript𝑠2subscript𝑌subscript𝑠1conditionalsubscript𝑀subscript𝑠1𝑠ctf-IE_{s_{1},s_{2}}(Y|s)=E[Y_{s_{1}},M_{s_{2}}-Y_{s_{1}},M_{s_{1}}|s]italic_c italic_t italic_f - italic_I italic_E start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_Y | italic_s ) = italic_E [ italic_Y start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_M start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT | italic_s ] (3)

According to Zhang (Zhang and Bareinboim 2018b), direct and indirect causal effects can be linearly combined and contribute to total variation by introducing an additional term that uncovers the spurious relations between S𝑆Sitalic_S and Y𝑌Yitalic_Y through confounding variables, X𝑋Xitalic_X.

TVs1,s2(Y)=DEs1,s2(Y|s)IEs2,s1(Y|s)SEs2,s1(Y)𝑇subscript𝑉subscript𝑠1subscript𝑠2𝑌𝐷subscript𝐸subscript𝑠1subscript𝑠2conditional𝑌𝑠𝐼subscript𝐸subscript𝑠2subscript𝑠1conditional𝑌𝑠𝑆subscript𝐸subscript𝑠2subscript𝑠1𝑌TV_{s_{1},s_{2}}(Y)=DE_{s_{1},s_{2}}(Y|s)-IE_{s_{2},s_{1}}(Y|s)-SE_{s_{2},s_{1% }}(Y)italic_T italic_V start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_Y ) = italic_D italic_E start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_Y | italic_s ) - italic_I italic_E start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_Y | italic_s ) - italic_S italic_E start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_Y ) (4)

Counterfactual Spurious Effect (ctf-SE) measures the average difference in outcome Y𝑌Yitalic_Y had S𝑆Sitalic_S been s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT by intervention compared to settings that would naturally choose S𝑆Sitalic_S to be s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. SE, in fact, measures all paths between S𝑆Sitalic_S and Y𝑌Yitalic_Y except the causal ones (direct and indirect),

ctfSEs1,s2(Y|s)=E[Ys1|s2Ys|s1]𝑐𝑡𝑓𝑆subscript𝐸subscript𝑠1subscript𝑠2conditional𝑌𝑠𝐸delimited-[]subscript𝑌subscript𝑠1subscript𝑠2subscript𝑌𝑠subscript𝑠1ctf-SE_{s_{1},s_{2}}(Y|s)=E[Y_{s_{1}}|s_{2}-Y_{s}|s_{1}]italic_c italic_t italic_f - italic_S italic_E start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT ( italic_Y | italic_s ) = italic_E [ italic_Y start_POSTSUBSCRIPT italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT | italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_Y start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT | italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ] (5)

In order to estimate these counterfactual quantities from data, we assume the presence of unconfoundedness between the sensitive attribute and outcome, along with the assumption of conditional ignorability. Moreover, leveraging the following two assumptions (1) none of the confounders are descendants of S𝑆Sitalic_S and (2) confounders block all backdoor paths from mediators to Y𝑌Yitalic_Y, we can express counterfactual quantities in terms of conditional distributions.

Sub-group discovery for heterogeneity assessment

We conduct sub-group discovery to identify and quantify causal effect heterogeneity among individuals from distinct sensitive categories (sensitive attributes are changed while keeping all other relevant variables constant). Generalized Random Forest (GRF) (Athey and Wager 2019) is employed in this study to measure conditional distribution which is an extension of traditional Random Forest by maximizing heterogeneity when splitting nodes in a decision tree. It incorporates a statistical criterion known as the Causal tree-splitting criterion, which integrates sensitive attribute assignments and outcome variables. GRF provides estimates of both average and individual causal effects, facilitating the detection of differential effects among sub-groups. This allows for the clustering and grouping of individual effects to reveal varying causal impacts. Essentially, GRF compares individuals within a sub-group to counterparts with different sensitive attributes while aiming to closely match all other relevant attributes.

Experiment and Setting

We have leveraged causal and sub-group analysis for disparity analysis using the pipeline shown in Figure 1 on two publicly available datasets, where we selected race as the sensitive attribute. We identified individuals with one race as the s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT group and the rest as the s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT group. Please refer to Table 1 for more details on the attributes designated as confounders, mediators, and outcomes. Within our pipeline, we utilized the faircause library (Plecko and Bareinboim 2022) and GRF (Wager and Athey 2018) for causal effect estimations.

Refer to caption
Figure 1: The steps involved in our approach to achieving fairness in ML classification models through causal pathway decomposition and sub-group analysis.
Table 1: The description of variables and their corresponding nodes in our causal graph for the disparity analysis experiments. where the sensitive attribute is defined as the White vs. non-White (Asian vs. Non-Asian) race.
Dataset Attributes
Node Types and Variables
Adults Sensitive (S) Race (S1: Non-White, S2: White)
Outcome (Y) Salary (2 categories: \leq$50K/>$50K)
Confounders (X)
Age (continuous),
Applicant Sex (2 categories),
Marital Status (7 categories)
Mediators (M)
Education (16 categories)
Workclass (8 categories)
Occupation (14 categories)
Capital Gain (continuous)
Capital Loss (continuous)
Hours/Week (continuous)
HDMA Sensitive (HDMA-White) Race (S1: Non-White, S2: White)
Sensitive (HDMA-Asian) Race (S1: Asian, S2: Non-Asian)
Outcome Loan Status (Accepted/Rejected)
Confounders
Property Type (2 categories)
Owner Occupancy (3 categories)
Applicant Sex (2 categories)
Loan Type (4 categories)
Mediators
Loan Amount (continuous)
Applicant Income (continuous)

Adult. The adult dataset (Becker and Kohavi 1996) is a multivariate dataset designed to predict whether an individual’s annual income will exceed $50,000. This prediction is based on census data and is commonly known as the ’Census Income’ dataset. The data extraction was carried out by Barry Becker, utilizing the 1994 Census database. In this dataset, the goal is to identify and quantify the basic impact of individuals’ race (specifically white) on income as listed in detail in Table 1.

HDMA. The Home Mortgage Disclosure Act (HMDA) (government 2016) mandates numerous financial institutions to uphold, report, and openly divulge mortgage-related information. These publicly accessible data hold significance as they provide insights into whether lenders are effectively addressing their communities’ housing requirements. They also furnish public officials with valuable information to facilitate decision-making and policy formation, while also unveiling lending trends that could potentially exhibit bias. For our experiments, we leveraged the HDMA “Washington State Home Loans, 2016” dataset comprising a total of 466,566 instances of home loans within the state of Washington. The variables encompass a diverse range of information, including demographic details, location-specific data, loan status, property and loan types, loan objectives, and the originating agency. For HDMA, we conducted two sets of experiments, referred to as HDMA-White and HDMA-Asian to explore the impact of the presence and absence of specific races on the outcome. Please refer to Table 1 for more details.

Results

Causal aware disparity analysis

In Table 2, we present total variations along with decomposed causal effects using causal forest for experimental datasets. All metrics are computed based on the difference in outcomes when the sensitive attribute transitions from s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT to s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT. Positive results for our experiment favour individuals with s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, while negative results favour the other sensitive group, which is s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT. Furthermore, we conducted comparisons of the causal effect estimates in our pipeline with two other widely-used causal decomposition libraries, as illustrated in Supplementary Table 5.

Table 2: Total Variation (TV), Direct Effect (DE), Indirect Effect (IE), and Spurious Effect (SE) estimations for each experiment.
Dataset TV111Total Variation ctf-DE222Direct Effect ctf-IE333Indirect Effect ctf-SE444Spurious Effect
Adult 0.104 0.015 0.032 0.057
HDMA-White
0.041 0.055 -0.005 -0.009
HDMA-Asian
0.005 0.030 -0.009 -0.017

Our findings reveal that within the Adult dataset, individuals from s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT group are approximately 10.4% more inclined to obtain an annual income exceeding $50,000 than the s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT group. Through causal analysis, we discern that about 1.5% of this 10.4% can be directly attributed to the causal influence of the sensitive attribute on the annual income (the full distribution of the ctf-DE is shown in the supplementary materials Figure 5). Additionally, approximately 3.2% can be attributed to an indirect effect mediated through other factors shown in table 1, while the remaining 6% is attributable to spurious effects.

In both experiments conducted with the HDMA dataset, minimal disparities were observed through TV. Specifically, there was a mere 4% difference and nearly zero TV between the two sensitive groups in loan acceptance status. In the first HDMA experiment, featuring a 4% TV, a 5.5% direct effect from the race to loan status was noted, while both the indirect and spurious effects were negligible. However, the analysis of the second experiment (last row in Table 2) yielded intriguing results. Despite the absence of TV and an indirect effect, there was a 3% direct effect observed alongside an approximately 1.7% negative spurious effect from the relevant factor to loan status. Additionally, as shown in the supplementary materials (Figures 5 and 6), the full distribution of direct causal effects for both HDMA experiments is skewed positively in favour of White and non-Asian sub-populations.

Figure 2 presents the top attributes identified within each experimental setting for direct causal effect estimation. In the Adult dataset, key attributes include age, education, workclass, occupation, and hours per week,while for HDMA, loan amount and application income are pivotal.

Refer to caption
(a) Adult
Refer to caption
(b) HDMA-White
Refer to caption
(c) HDMA-Asian
Figure 2: Variable importance for top 5 attributes of each experiment.

Sub-group analysis

Drawing on the distribution of the ctf-DE, as shown in the supplementary materials (Figure 5), we examined the trade-off between ensuring consistency across datasets with varying distributions and maintaining intra-group alignment on ctf-DE ranges for each dataset to minimize variations. We, therefore, determined four distinct sub-groups with the summary shown in Table 3 for categorical variables and Figure 3 for continuous variables, across all two experimental datasets. The sub-groups are arranged from negative to positive direct causal effect values, with Sub-group 1 representing ctf-DE values less than 0.010.01-0.01- 0.01 (negative effects in favour of individuals in s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT category), Sub-group 2 comprising ctf-DE values between 0.010.01-0.01- 0.01 and 0.010.010.010.01 (around zero effects), Sub-group 3 encompassing ctf-DE values between 0.010.010.010.01 and 0.050.050.050.05 (positive effects in favour of individuals in s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT category), and Sub-group 4 indicating values greater than 0.050.050.050.05 (very positive effects in favour of individuals in s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT category) for the Adult dataset. For the HDMA dataset, the sub-groups are slightly different due to the ctf-DE values being skewed positively. Sub-group 1 has ctf-DE values less than 0.0050.005-0.005- 0.005, Sub-group 2 ctf-DE values between 0.0050.005-0.005- 0.005 and 0.0250.0250.0250.025, Sub-group 3 ctf-DE values between 0.0250.0250.0250.025 and 0.070.070.070.07 (in favour of White or non-Asian), and Sub-group 4 indicating values greater than 0.070.070.070.07. For the categorical variables, we have reported the counts for the majority and non-majority categories for each sensitive (racial) group within each of the sub-groups. Evidently, there are remarkable similarities in both majority and minority counts and mean and standard deviation between the two sensitive categories within each sub-group.

Table 3: Summary of sub-group analysis for the Adult experiment with categorical variables. Sub-group 1 represents ctf-DE values less than 0.010.01-0.01- 0.01, Sub-group 2 represents ctf-DE values between 0.010.01-0.01- 0.01 and 0.010.010.010.01 (around zero effects), Sub-group 3 represents ctf-DE values between 0.010.010.010.01 and 0.050.050.050.05, and Sub-group 4 represents ctf-DE values greater than 0.050.050.050.05.
Education-Num Workclass Occupation
Majority Minority Majority Minority Majority Minority
Adults
Sub-group 1 White College (%57.6) Preschool (%0.2) Private (%86.1) State Gov. (%0.5) Craft Repair (%43.1) Transportation (%0.4)
(TV:0.04) Non-White College (%57.7) Masters (%3.8) Private (%88.5) Self-emp.(NI) (%3.8) Craft Repair (%53.8) MOI 555Machine Operator Inspection (%1.9)
Sub-group 2 White High School (%32.2) Preschool (%0.2) Private (%73.9) Without Pay (%0.1) Craft Repair (%13.0) Armed-Forces (%0.0)
(TV:0.09) Non-White High School (%34.4) Preschool (%0.3) Private (%74.5) Without Pay (%0.0) Other Service (%18.2) Armed-Forces (%0.0)
Sub-group 3 White High School (%38.3) 1st-4th Grade (%0.1) Private (%69.8) Without Pay (%0.0) Prof. Specialty (%20.0) Armed-Forces (%0.0)
(TV:0.12) Non-White High School (%37.0) 11th Grade (%0.2) Private (%68.5) Self-emp.(I) (%3.0) Prof. Specialty (%23.2) Armed-Forces (%0.2)
Sub-group 4 White Bachelors (%50.7) Assoc. Voc (%0.2) Private (%70.0) Local Gov. (%0.2) Exec-managerial (%40.1) Protective Serv. (%0.4)
(TV:0.1) Non-White Bachelors (%37.7) College (%3.8) Private (%66.0) Federal Gov. (%1.9) Prof. Specialty (%49.1) MOI (%1.9)
Loan Purpose Applicant Sex Loan Type
Majority Minority Majority Minority Majority Minority
HDMA - White
Sub-group 1 White Refinancing (%77.0) Home Improv. (%1.3) Male (%100.0) Male (%100.0) Conventional (%100.0) Conventional (%100.0)
(TV:0.05) Non-White Refinancing (%73.7) Home Purchase (%26.3) Male (%100.0) Male (%100.0) Conventional (%100.0) Conventional (%100.0)
Sub-group 2 White Refinancing (%63.9) Home Improv. (%2.2) Male (%93.6) Female (%6.4) Conventional (%98.8) FHA-insured (%0.4)
(TV:-0.02) Non-White Refinancing (%49.6) Home Improv. (%1.0) Male (%93.7) Female (%6.3) Conventional (%99.6) FHA-insured (%0.1)
Sub-group 3 White Home Purchase (%69.2) Home Improv. (%2.6) Male (%77.7) Female (%22.3) Conventional (%71.6) FSA/RHS (%1.3)
(TV:0.03) Non-White Home Purchase (%71.8) Home Improv. (%1.7) Male (%74.8) Female (%25.2) Conventional (%78.1) FSA/RHS (%0.3)
Sub-group 4 White Refinancing (%61.8) Home Improv. (%9.2) Male (%65.8) Female (%34.2) Conventional (%78.5) FSA/RHS (%0.8)
(TV:0.1) Non-White Refinancing (%65.9) Home Improv. (%7.7) Male (%62.9) Female (%37.1) Conventional (%79.9) FSA/RHS (%0.3)
Loan Purpose Applicant Sex Occupation Type
Majority Minority Majority Minority Majority Minority
HDMA - Asian
Sub-group 1 White Refinancing (%95.3) Home Improv. (%0.3) Male (%95.7) Female (%4.3) Principal (%98.1) N/A (%0.0)
(TV:0.03) Non-White Refinancing (%92.3) Home Improv. (%0.1) Male (%96.2) Female (%3.8) Principal (%96.4) Non-Principal (%3.6)
Sub-group 2 White Refinancing (%75.3) Home Improv. (%1.5) Male (%86.3) Female (%13.7) Principal (%90.8) N/A (%0.0)
(TV:-0.01) Non-White Refinancing (%58.7) Home Improv. (%0.8) Male (%85.6) Female (%14.4) Principal (%86.2) Non-Principal (%13.8)
Sub-group 3 White Home Purchase (%59.3) Home Improv. (%6.3) Male (%74.6) Female (%25.4) Principal (%90.4) N/A (%0.0)
(TV:0.02) Non-White Home Purchase (%67.8) Home Improv. (%3.3) Male (%71.0) Female (%29.0) Principal (%82.9) N/A (%0.0)
Sub-group 4 White Refinancing (%55.6) Home Improv. (%8.5) Male (%63.0) Female (%37.0) Principal (%94.1) N/A (%0.1)
(TV:0.08) Non-White Refinancing (%62.5) Home Improv. (%5.7) Male (%58.0) Female (%42.0) Principal (%90.2) Non-Principal (%9.8)
Refer to caption
(a) dd
Refer to caption
(b)
Refer to caption
(c)
Refer to caption
(d)
Refer to caption
(e) E
Refer to caption
(f)
Figure 3: Summary of sub-group analysis for the continuous Variables: (a) and (b) represent Age and Hours per Week in the Adult dataset; (c) and (d) represent Application Income and Loan Amount in the HDMA-White dataset; and (e) and (f) represent Application Income and Loan Amount in the HDMA-Asian dataset. Sub-group 1 represents ctf-DE values less than 0.010.01-0.01- 0.01, Sub-group 2 represents ctf-DE values between 0.010.01-0.01- 0.01 and 0.010.010.010.01 (around zero effects), Sub-group 3 represents ctf-DE values between 0.010.010.010.01 and 0.050.050.050.05, and Sub-group 4 represents ctf-DE values greater than 0.050.050.050.05.

Applications for fairness in Machine Learning

In order to assess the practical utility and effectiveness of our causal disparity analysis in ML and automated decision-making, we trained an XGBoost classifier on both datasets to predict outcomes (referred to as the outcome node in Table 1). We created an 80-20% train-test split using stratified sampling from all sub-groups (direct causal effect values). We computed classification results for the test set within each sub-group using AI Fairness 360 (Bellamy et al. 2018) library. Tables 4 present the classification results, with the first row indicating the average performance and the last four rows representing the heterogeneity of performance across sub-groups. Across all experiments, performance varies among the sub-groups, with Sub-group 4 exhibiting worse performance and higher variability for all datasets except for the recall value for the Adult dataset; the Adult dataset (Precision: 0.760.760.760.76(95% CI interval: 0.0160.0160.0160.016), Recall: 0.710.710.710.71(0.0070.0070.0070.007), and Accuracy:0.740.740.740.74(0.0070.0070.0070.007)) for HDMA-White (Precision:0.69(0.000), Recall: 0.89(0.013), Accuracy: 0.68(0.005)) and -Asian (Precision: 0.68(0.001), Recall: 0.86(0.011), Accuracy: 0.67(0.003). In total, the performance of the Sub-groups 1 and 4 are lower than the other Sub-groups.

To better gauge the fairness of our ML classifier in our experiments and evaluate how decisions would differ if the circumstances were different, we plotted the performance gaps for the accuracy, recall, and precision between any two sensitive categories (s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - s1subscript𝑠1s_{1}italic_s start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT) in Figure 4 across all sub-groups. Notably, almost 70% of the performance gaps (positive gaps) favour the sensitive category s2subscript𝑠2s_{2}italic_s start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, which corresponds to the White individuals for Adult and HDMA-White, and the Non-Asian category for HDMA-Asian. As the plots indicate, the absolute value of the aggregated performance gap ranges from 0 to 0.07, whereas within sub-groups, the variation is more pronounced. The largest gaps are observed in Sub-group 1 (Precision:0.5, Recall:0.27, and Accuracy:0.29) for the Adult dataset and Sub-group 4 for HDMA-White (Precision:0.06, Recall:0.09, and Accuracy:0.06) and HDMA-Asian (Precision:0.05, Recall:0.08, and Accuracy:.04). The combination of performance gaps and lower performance in Sub-group 4 indicates the model’s bias toward one of the sensitive categories. Of particular interest are the significant gaps in recall measures (higher false negative rates for one of the sensitive categories) among individuals in Sub-group 4 for both HDMA experiments.

Refer to caption
(a) Adult
Refer to caption
(b) HDMA-White
Refer to caption
(c) HDMA-Asian
Figure 4: Difference in performance metrics including accuracy, precision, and recall between sensitive groups.
Table 4: The XGBoost model classification performance.
Adult (mean (95% CI interval)) HDMA-White(mean(95% CI interval)) HDMA-Asian(mean(95% CI interval))
Precision Recall Accuracy Precision Recall Accuracy Precision Recall Accuracy
Entire Test Set 0.77(0.012) 0.66(0.004) 0.87(0.003) 0.78(0.001) 0.95(0.006) 0.76(0.003) 0.78(0.001) 0.96(0.005) 0.77(0.002)
Sub-group 1 0.76(0.016) 0.61(0.008) 0.77(0.007) 0.77(0.002) 0.99(0.001) 0.77(0.002) 0.78(0.001) 0.99(0.005) 0.77(0.003)
Sub-group 2 0.78(0.021) 0.63(0.004) 0.93(0.002) 0.82(0.001) 0.99(0.003) 0.81(0.001) 0.82(0.001) 0.99(0.004) 0.81(0.002)
Sub-group 3 0.78(0.012) 0.66(0.003) 0.87(0.003) 0.83(0.001) 0.99(0.004) 0.82(0.001) 0.79(0.001) 0.96(0.004) 0.77(0.001)
Sub-group 4 0.76(0.009) 0.71(0.007) 0.74(0.007) 0.69(0.000) 0.89(0.013) 0.68(0.005) 0.68(0.001) 0.86(0.011) 0.67(0.003)

Discussion

Main Findings

In this study, we have demonstrated the utilization of causal disparity analysis to show the complex relationships and causal pathways linking sensitive attributes (such as race) to real-world observational data outcomes (such as loan status or income) to supplement total variation (TV) also referred to as demographic parity. Our analysis is rooted in the assumptions of a basic causal graph, from which all findings are derived. Notably, our key finding reveals a direct causal link between race and loan status or income, which might not have been apparent from the observed disparities alone. In the Adult dataset, our analysis reveals the presence of indirect effects through mediators, a phenomenon that resonates with prior research by Binkytė et al. (Binkytė et al. 2023). However, the author’s exploration of fairness measures across different causal discovery algorithms and causal paths demonstrated significant variability in the observed discrimination.

Considering the presence of direct causal effects within our datasets, we delved deeper into the variability among individuals regarding how race directly influences their outcomes. This variability led to the identification of four distinct sub-groups, each sharing similar characteristics except for race. In other words, within each sub-group, all covariates except race remained consistent, with race being hypothetically randomized. The ML model used in our study showed varying performance across these sub-groups. Sub-groups with higher and positive direct causal effects, which exhibited larger disparities in outcomes attributed to race, experienced lower model performance. This performance gap within these sub-groups indicates potential unfairness and bias in the ML model, suggesting that race may be a factor contributing to disparate outcomes. In all three experiments, the larger gap in false negative rates for Sub-group 4, which is not in favor of non-whites and Asians, suggests that the classifier tends to incorrectly predict loan status as rejected when it is actually accepted among these individuals, compared to white individuals within the same sub-group. This indicates a bias in the predictions against non-white individuals. Similarly, in the HDMA-Asian dataset, there is a similar disparity where predictions are biased against non-Asians. Furthermore, for the Adult dataset, in addition to the large recall gap for Sub-group 4, there is a large gap in the true positive rates in Sub-group 1 in favour of white individuals. This implies that the classifier is more successful at correctly predicting high income among white individuals in Sub-group 1 compared to non-white individuals within the same sub-group. This suggests a bias in favor of white individuals in predicting high income.

In essence, this is a nuanced finding that cannot be captured solely by dividing the entire sample size into privileged and unprivileged groups based on the sensitive attribute alone which is race in our case. Our research findings are in accordance with existing literature in two significant respects. First, employing decomposed and structural causal analysis, our results resonate with a substantial body of research delving into mediating mechanisms by estimating both natural direct and indirect effects within the potential outcome framework (Jackson 2021; Park, Qin, and Lee 2022; Jeffries et al. 2019). Our causal methodology experiments echo the trajectory of research pioneered by counterfactual causal fairness analysis (Plecko and Bareinboim 2022) working on quantifying discrimination, decomposing variations, and deriving empirical measures of fairness from data. Second, considering heterogeneity in causal effects, our approach and findings align with other studies where the concept of heterogeneous treatment effects and the use of causal forest have been employed (Dandl et al. 2024). For instance, similar methodologies have been leveraged in analyzing environmental policy effects (Miller 2020), conducting cost-effectiveness analyses encompassing outcomes, costs, and net monetary benefits (Bonander and Svensson 2021), as well as in assessing educational interventions and grading discrimination (Jin, Naghi, and Pick 2019).

Limitations and Future Directions

As ML advances at an unprecedented pace, its societal implications have attracted heightened scrutiny. Consequently, the importance of conducting disparity analysis has been emphasized in the contemporary landscape. While this study has provided valuable insights into causal disparity analysis, it’s essential to acknowledge several limitations and explore potential avenues for improvement. The analysis primarily focused on disparities related to a single protective attribute, such as race. However, this narrow focus may not fully capture the intricate interplay of multiple factors contributing to discrimination and bias in real-world scenarios. Future research should consider incorporating intersectional disparity analysis, which examines how multiple protective attributes intersect and interact to shape outcomes. In line with this, future work should also involve a thorough exploration of diverse causal discovery algorithms and identification methods. It’s worth noting that the reliance solely on a basic causal graph framework in this study presents a limitation, as it may oversimplify the intricate causal relationships inherent in real-world data. Additionally, the datasets used in this study may not comprehensively represent the diversity and complexity of real-world populations. Limited diversity within the datasets can lead to biased results and may not encompass the full range of experiences and challenges faced by individuals from marginalized or underrepresented groups. Future work should involve utilizing more diverse and representative datasets, validating the findings within specific contexts, and identifying any context-specific factors that may influence fairness and bias.

To conclude, our study emphasized the imperative of delving into causal pathways, decomposing them, and assessing heterogeneity among individuals. This approach not only offers a comprehensive understanding of disparities within the data but also enables targeted interventions and strategies to promote fairness and equity.

References

  • Athey and Wager (2019) Athey, S.; and Wager, S. 2019. Estimating treatment effects with causal forests: An application. Observational studies, 5(2): 37–51.
  • Barocas, Hardt, and Narayanan (2023a) Barocas, S.; Hardt, M.; and Narayanan, A. 2023a. Fairness and Machine Learning: Limitations and Opportunities. MIT Press.
  • Barocas, Hardt, and Narayanan (2023b) Barocas, S.; Hardt, M.; and Narayanan, A. 2023b. Fairness and machine learning: Limitations and opportunities. MIT Press.
  • Becker and Kohavi (1996) Becker, B.; and Kohavi, R. 1996. Adult. UCI Machine Learning Repository. DOI: https://doi.org/10.24432/C5XW20.
  • Bellamy et al. (2018) Bellamy, R. K. E.; Dey, K.; Hind, M.; Hoffman, S. C.; Houde, S.; Kannan, K.; Lohia, P.; Martino, J.; Mehta, S.; Mojsilovic, A.; Nagar, S.; Ramamurthy, K. N.; Richards, J.; Saha, D.; Sattigeri, P.; Singh, M.; Varshney, K. R.; and Zhang, Y. 2018. AI Fairness 360: An Extensible Toolkit for Detecting, Understanding, and Mitigating Unwanted Algorithmic Bias.
  • Binkytė et al. (2023) Binkytė, R.; Makhlouf, K.; Pinzón, C.; Zhioua, S.; and Palamidessi, C. 2023. Causal discovery for fairness. In Workshop on Algorithmic Fairness through the Lens of Causality and Privacy, 7–22. PMLR.
  • Bird et al. (2020) Bird, S.; Dudík, M.; Edgar, R.; Horn, B.; Lutz, R.; Milan, V.; Sameki, M.; Wallach, H.; and Walker, K. 2020. Fairlearn: A toolkit for assessing and improving fairness in AI. Technical Report MSR-TR-2020-32, Microsoft.
  • Bonander and Svensson (2021) Bonander, C.; and Svensson, M. 2021. Using causal forests to assess heterogeneity in cost-effectiveness analysis. Health Economics, 30(8): 1818–1832.
  • Braveman et al. (2011) Braveman, P. A.; Kumanyika, S.; Fielding, J.; LaVeist, T.; Borrell, L. N.; Manderscheid, R.; and Troutman, A. 2011. Health disparities and health equity: the issue is justice. American journal of public health, 101(S1): S149–S155.
  • Coffman et al. (2023) Coffman, D. L.; Schuler, M. S.; Nguyen, T. Q.; and McCaffrey, D. F. 2023. Weighting Estimators for Causal Mediation. In Handbook of Matching and Weighting Adjustments for Causal Inference, 373–412. Chapman and Hall/CRC.
  • Crump et al. (2008) Crump, R. K.; Hotz, V. J.; Imbens, G. W.; and Mitnik, O. A. 2008. Nonparametric tests for treatment effect heterogeneity. The Review of Economics and Statistics, 90(3): 389–405.
  • Dandl et al. (2024) Dandl, S.; Haslinger, C.; Hothorn, T.; Seibold, H.; Sverdrup, E.; Wager, S.; and Zeileis, A. 2024. What makes forest-based heterogeneous treatment effect estimators work? The Annals of Applied Statistics, 18(1): 506–528.
  • Díaz-Rodríguez et al. (2023) Díaz-Rodríguez, N.; Del Ser, J.; Coeckelbergh, M.; de Prado, M. L.; Herrera-Viedma, E.; and Herrera, F. 2023. Connecting the dots in trustworthy Artificial Intelligence: From AI principles, ethics, and key requirements to responsible AI systems and regulation. Information Fusion, 101896.
  • Feldman et al. (2015) Feldman, M.; Friedler, S. A.; Moeller, J.; Scheidegger, C.; and Venkatasubramanian, S. 2015. Certifying and removing disparate impact. In proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, 259–268.
  • Glymour and Hamad (2018) Glymour, M. M.; and Hamad, R. 2018. Causal thinking as a critical tool for eliminating social inequalities in health.
  • government (2016) government, U. S. 2016. Home Mortgage Disclosure Act.
  • Hariton and Locascio (2018) Hariton, E.; and Locascio, J. J. 2018. Randomised controlled trials—the gold standard for effectiveness research. BJOG: an international journal of obstetrics and gynaecology, 125(13): 1716.
  • Holland (1986) Holland, P. W. 1986. Statistics and Causal Inference. Journal of the American Statistical Association, 81(396): 945–960.
  • Jackson (2021) Jackson, J. W. 2021. Meaningful causal decompositions in health equity research: definition, identification, and estimation through a weighting framework. Epidemiology, 32(2): 282–290.
  • Jeffries et al. (2019) Jeffries, N.; Zaslavsky, A. M.; Diez Roux, A. V.; Creswell, J. W.; Palmer, R. C.; Gregorich, S. E.; Reschovsky, J. D.; Graubard, B. I.; Choi, K.; Pfeiffer, R. M.; et al. 2019. Methodological approaches to understanding causes of health disparities. American journal of public health, 109(S1): S28–S33.
  • Jin, Naghi, and Pick (2019) Jin, F. F.; Naghi, A.; and Pick, A. 2019. Heterogeneous Treatment Effects of Educational Interventions by using Random Forests.
  • Kavouras et al. (2024) Kavouras, L.; Tsopelas, K.; Giannopoulos, G.; Sacharidis, D.; Psaroudaki, E.; Theologitis, N.; Rontogiannis, D.; Fotakis, D.; and Emiris, I. 2024. Fairness Aware Counterfactuals for Subgroups. Advances in Neural Information Processing Systems, 36.
  • Kearns et al. (2018) Kearns, M.; Neel, S.; Roth, A.; and Wu, Z. S. 2018. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. In International conference on machine learning, 2564–2572. PMLR.
  • Kearns et al. (2019) Kearns, M.; Neel, S.; Roth, A.; and Wu, Z. S. 2019. An empirical study of rich subgroup fairness for machine learning. In Proceedings of the conference on fairness, accountability, and transparency, 100–109.
  • Khademi et al. (2019) Khademi, A.; Lee, S.; Foley, D.; and Honavar, V. 2019. Fairness in algorithmic decision making: An excursion through the lens of causality. In The World Wide Web Conference, 2907–2914.
  • Lee (2009) Lee, M.-j. 2009. Non-parametric tests for distributional treatment effect for randomly censored responses. Journal of the Royal Statistical Society Series B: Statistical Methodology, 71(1): 243–264.
  • MacKinnon, Fairchild, and Fritz (2007) MacKinnon, D. P.; Fairchild, A. J.; and Fritz, M. S. 2007. Mediation analysis. Annu. Rev. Psychol., 58: 593–614.
  • Mehrabi et al. (2021) Mehrabi, N.; Morstatter, F.; Saxena, N.; Lerman, K.; and Galstyan, A. 2021. A survey on bias and fairness in machine learning. ACM computing surveys (CSUR), 54(6): 1–35.
  • Miller (2020) Miller, S. 2020. Causal forest estimation of heterogeneous and time-varying environmental policy effects. Journal of Environmental Economics and Management, 103: 102337.
  • Morgan and Winship (2015) Morgan, S. L.; and Winship, C. 2015. Counterfactuals and causal inference. Cambridge University Press.
  • Nations (2023) Nations, U. 2023. Universal Declaration of Human Rights.
  • Park, Qin, and Lee (2022) Park, S.; Qin, X.; and Lee, C. 2022. Estimation and sensitivity analysis for causal decomposition in health disparity research. Sociological Methods & Research, 00491241211067516.
  • Pearl (2014) Pearl, J. 2014. Interpretation and identification of causal mediation. Psychological methods, 19(4): 459.
  • Pearl (2022) Pearl, J. 2022. Detecting latent heterogeneity. In Probabilistic and Causal Inference: The Works of Judea Pearl, 483–506. ACM Books.
  • Pfohl et al. (2023) Pfohl, S. R.; Harris, N.; Nagpal, C.; Madras, D.; Mhasawade, V.; Salaudeen, O. E.; Heller, K. A.; Koyejo, S.; and D’Amour, A. N. 2023. Understanding subgroup performance differences of fair predictors using causal models. In NeurIPS 2023 Workshop on Distribution Shifts: New Frontiers with Foundation Models.
  • Plecko and Bareinboim (2022) Plecko, D.; and Bareinboim, E. 2022. Causal fairness analysis. arXiv preprint arXiv:2207.11385.
  • Plecko and Bareinboim (2024) Plecko, D.; and Bareinboim, E. 2024. A Causal Framework for Decomposing Spurious Variations. Advances in Neural Information Processing Systems, 36.
  • Raza et al. (2024) Raza, S.; Ghuge, S.; Ding, C.; Dolatabadi, E.; and Pandya, D. 2024. FAIR Enough: Develop and Assess a FAIR-Compliant Dataset for Large Language Model Training? Data Intelligence, 6(2): 559–585.
  • Rubin (1974) Rubin, D. B. 1974. Estimating causal effects of treatments in randomized and nonrandomized studies. Journal of educational Psychology, 66(5): 688.
  • Rubin (2005) Rubin, D. B. 2005. Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association, 100(469): 322–331.
  • Saha et al. (2020) Saha, D.; Schumann, C.; Mcelfresh, D.; Dickerson, J.; Mazurek, M.; and Tschantz, M. 2020. Measuring non-expert comprehension of machine learning fairness metrics. In International Conference on Machine Learning, 8377–8387. PMLR.
  • Saleiro et al. (2018) Saleiro, P.; Kuester, B.; Stevens, A.; Anisfeld, A.; Hinkson, L.; London, J.; and Ghani, R. 2018. Aequitas: A Bias and Fairness Audit Toolkit. arXiv preprint arXiv:1811.05577.
  • Shui et al. (2022) Shui, C.; Xu, G.; Chen, Q.; Li, J.; Ling, C. X.; Arbel, T.; Wang, B.; and Gagné, C. 2022. On learning fairness and accuracy on multiple subgroups. Advances in Neural Information Processing Systems, 35: 34121–34135.
  • Verma and Rubin (2018) Verma, S.; and Rubin, J. 2018. Fairness definitions explained. In Proceedings of the international workshop on software fairness, 1–7.
  • Wager and Athey (2018) Wager, S.; and Athey, S. 2018. Estimation and inference of heterogeneous treatment effects using random forests. Journal of the American Statistical Association, 113(523): 1228–1242.
  • Willke et al. (2012) Willke, R. J.; Zheng, Z.; Subedi, P.; Althin, R.; and Mullins, C. D. 2012. From concepts, theory, and evidence of heterogeneity of treatment effects to methodological approaches: a primer. BMC medical research methodology, 12: 1–12.
  • Wu et al. (2019) Wu, Y.; Zhang, L.; Wu, X.; and Tong, H. 2019. Pc-fairness: A unified framework for measuring causality-based fairness. Advances in neural information processing systems, 32.
  • Yang, Cisse, and Koyejo (2020) Yang, F.; Cisse, M.; and Koyejo, S. 2020. Fairness with overlapping groups; a probabilistic perspective. Advances in neural information processing systems, 33: 4067–4078.
  • Zhang and Bareinboim (2018a) Zhang, J.; and Bareinboim, E. 2018a. Equality of opportunity in classification: A causal approach. Advances in neural information processing systems, 31.
  • Zhang and Bareinboim (2018b) Zhang, J.; and Bareinboim, E. 2018b. Fairness in decision-making—the causal explanation formula. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 32.
  • Zhang, Wu, and Wu (2016) Zhang, L.; Wu, Y.; and Wu, X. 2016. A causal framework for discovering and removing direct and indirect discrimination. arXiv preprint arXiv:1611.07509.

Appendix A Natural Effects

In Table 5, we provide estimations of natural effects using three methods—CFA-CRF (Plecko and Bareinboim 2022), CFA-MedDML (Plecko and Bareinboim 2022), and twangmediation (Coffman et al. 2023)—for all three datasets.

Table 5: Summary of natural effect measurments. Each value in the table is formatted as mean (standard deviation).
White HDMA-White HDMA-Asian
NDE NIE NDE NIE NDE NIE
CFA-CRF (Plecko and Bareinboim 2022) 0.016 (0.0001) 0.034 (0.0002) 0.062 (0.0001) -0.01 (0.0001) 0.041 (0.0001) -0.016 (0.0001)
CFA-MedDML (Plecko and Bareinboim 2022) 0.006 (0.0051) 0.044 (0.0019) 0.059 (0.0026) -0.005 (0.0003) 0.041 (0.003) -0.011 (0.0004)
twangmediation (Coffman et al. 2023) 0.017 (0.008) 0.032 (0.001) 0.065 (0.003) -0.007 (0.007) 0.048 (0.004) -0.179 (0.492)

Appendix B Histogram of ctf-DE and NDE Values

In Figure 5, we provide histogram plots of ctf-DE values for the three datasets: Adult, HDMA-White, and HDMA-Asian. Each histogram provides a visual representation of the distribution and spread of ctf-DE values within each dataset. These figures provide us with the knowledge to find optimal sub-groups. In Figure 6, we provide histogram plots of NDE values for all three datasets as well.

Refer to caption
(a) Adult dataset
Refer to caption
(b) HDMA-White dataset
Refer to caption
(c) HDMA-Asian dataset
Figure 5: Histogram of ctf-DE values
Refer to caption
(a) Adult dataset
Refer to caption
(b) HDMA-White dataset
Refer to caption
(c) HDMA-Asian dataset
Figure 6: Histogram of NDE values