FairLogue: A Toolkit for Intersectional Fairness Analysis in Clinical Machine Learning Models
Nick Souligne, Vignesh Subbian
TLDR
Fairlogue is a Python toolkit for assessing intersectional fairness in clinical ML models, revealing hidden disparities beyond single-axis analyses.
Key contributions
- Introduces Fairlogue, a Python toolkit for intersectional fairness analysis in clinical ML.
- Provides observational, counterfactual, and generalized counterfactual fairness frameworks.
- Evaluated on EHR data for glaucoma surgery prediction, revealing significant intersectional disparities.
- Identified larger fairness gaps (e.g., demographic parity diff 0.20) than single-axis analyses.
Why it matters
Most fairness tools miss compounded disparities in intersectional populations. Fairlogue addresses this by providing a comprehensive toolkit to quantify and evaluate intersectional bias in clinical ML. This helps ensure more equitable and trustworthy healthcare AI.
Original Abstract
Objective: Algorithmic fairness is essential for equitable and trustworthy machine learning in healthcare. Most fairness tools emphasize single-axis demographic comparisons and may miss compounded disparities affecting intersectional populations. This study introduces Fairlogue, a toolkit designed to operationalize intersectional fairness assessment in observational and counterfactual contexts within clinical settings. Methods: Fairlogue is a Python-based toolkit composed of three components: 1) an observational framework extending demographic parity, equalized odds, and equal opportunity difference to intersectional populations; 2) a counterfactual framework evaluating fairness under treatment-based contexts; and 3) a generalized counterfactual framework assessing fairness under interventions on intersectional group membership. The toolkit was evaluated using electronic health record data from the All of Us Controlled Tier V8 dataset in a glaucoma surgery prediction task using logistic regression with race and gender as protected attributes. Results: Observational analysis identified substantial intersectional disparities despite moderate model performance (AUROC = 0.709; accuracy = 0.651). Intersectional evaluation revealed larger fairness gaps than single-axis analyses, including demographic parity differences of 0.20 and equalized odds true positive and false positive rate gaps of 0.33 and 0.15, respectively. Counterfactual analysis using permutation-based null distributions produced unfairness ("u-value") estimates near zero, suggesting observed disparities were consistent with chance after conditioning on covariates. Conclusion: Fairlogue provides a modular toolkit integrating observational and counterfactual methods for quantifying and evaluating intersectional bias in clinical machine learning workflows.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.