Fairness Audits of Institutional Risk Models in Deployed ML Pipelines

April 21, 20262604.19468

Kelly McConvey, Dipto Das, Maya Ghai, Angelina Zhai, Rosa Lee + 1 more

cs.CYcs.AIcs.HC

TLDR

This paper audits an Early Warning System, revealing systematic fairness issues where younger, male, and international students are over-flagged for support.

Key contributions

Developed a replica-based audit methodology for deployed institutional ML systems.
Audited an Early Warning System (EWS) for student support, replicating its model with institutional data.
Revealed systematic misallocation: younger, male, and international students are disproportionately flagged.
Showed post-processing amplifies disparities, highlighting the need for construct validity.

Why it matters

This work provides a replicable methodology for auditing institutional ML systems, demonstrating how fairness disparities emerge and compound across the entire pipeline. It highlights the critical importance of evaluating construct validity alongside statistical fairness in deployed models, impacting resource allocation.

Original Abstract

Fairness audits of institutional risk models are critical for understanding how deployed machine learning pipelines allocate resources. Drawing on multi-year collaboration with Centennial College, where our prior ethnographic work introduced the ASP-HEI Cycle, we present a replica-based audit of a deployed Early Warning System (EWS), replicating its model using institutional training data and design specifications. We evaluate disparities by gender, age, and residency status across the full pipeline (training data, model predictions, and post-processing) using standard fairness metrics. Our audit reveals systematic misallocation: younger, male, and international students are disproportionately flagged for support, even when many ultimately succeed, while older and female students with comparable dropout risk are under-identified. Post-processing amplifies these disparities by collapsing heterogeneous probabilities into percentile-based risk tiers. This work provides a replicable methodology for auditing institutional ML systems and shows how disparities emerge and compound across stages, highlighting the importance of evaluating construct validity alongside statistical fairness. It contributes one empirical thread to a broader program investigating algorithms, student data, and power in higher education.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers