FairLogue: A Toolkit for Intersectional Fairness Analysis in Clinical Machine Learning Models

April 6, 20262604.04858

cs.LGq-bio.QM

TLDR

Fairlogue is a Python toolkit for assessing intersectional fairness in clinical ML models, revealing hidden disparities beyond single-axis analyses.

Key contributions

Introduces Fairlogue, a Python toolkit for intersectional fairness analysis in clinical ML.
Provides observational, counterfactual, and generalized counterfactual fairness frameworks.
Evaluated on EHR data for glaucoma surgery prediction, revealing significant intersectional disparities.
Identified larger fairness gaps (e.g., demographic parity diff 0.20) than single-axis analyses.

Why it matters

Most fairness tools miss compounded disparities in intersectional populations. Fairlogue addresses this by providing a comprehensive toolkit to quantify and evaluate intersectional bias in clinical ML. This helps ensure more equitable and trustworthy healthcare AI.

Original Abstract

Objective: Algorithmic fairness is essential for equitable and trustworthy machine learning in healthcare. Most fairness tools emphasize single-axis demographic comparisons and may miss compounded disparities affecting intersectional populations. This study introduces Fairlogue, a toolkit designed to operationalize intersectional fairness assessment in observational and counterfactual contexts within clinical settings. Methods: Fairlogue is a Python-based toolkit composed of three components: 1) an observational framework extending demographic parity, equalized odds, and equal opportunity difference to intersectional populations; 2) a counterfactual framework evaluating fairness under treatment-based contexts; and 3) a generalized counterfactual framework assessing fairness under interventions on intersectional group membership. The toolkit was evaluated using electronic health record data from the All of Us Controlled Tier V8 dataset in a glaucoma surgery prediction task using logistic regression with race and gender as protected attributes. Results: Observational analysis identified substantial intersectional disparities despite moderate model performance (AUROC = 0.709; accuracy = 0.651). Intersectional evaluation revealed larger fairness gaps than single-axis analyses, including demographic parity differences of 0.20 and equalized odds true positive and false positive rate gaps of 0.33 and 0.15, respectively. Counterfactual analysis using permutation-based null distributions produced unfairness ("u-value") estimates near zero, suggesting observed disparities were consistent with chance after conditioning on covariates. Conclusion: Fairlogue provides a modular toolkit integrating observational and counterfactual methods for quantifying and evaluating intersectional bias in clinical machine learning workflows.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers