Fairness Audits of Institutional Risk Models in Deployed ML Pipelines
Kelly McConvey, Dipto Das, Maya Ghai, Angelina Zhai, Rosa Lee + 1 more
TLDR
This paper audits an Early Warning System, revealing systematic fairness issues where younger, male, and international students are over-flagged for support.
Key contributions
- Developed a replica-based audit methodology for deployed institutional ML systems.
- Audited an Early Warning System (EWS) for student support, replicating its model with institutional data.
- Revealed systematic misallocation: younger, male, and international students are disproportionately flagged.
- Showed post-processing amplifies disparities, highlighting the need for construct validity.
Why it matters
This work provides a replicable methodology for auditing institutional ML systems, demonstrating how fairness disparities emerge and compound across the entire pipeline. It highlights the critical importance of evaluating construct validity alongside statistical fairness in deployed models, impacting resource allocation.
Original Abstract
Fairness audits of institutional risk models are critical for understanding how deployed machine learning pipelines allocate resources. Drawing on multi-year collaboration with Centennial College, where our prior ethnographic work introduced the ASP-HEI Cycle, we present a replica-based audit of a deployed Early Warning System (EWS), replicating its model using institutional training data and design specifications. We evaluate disparities by gender, age, and residency status across the full pipeline (training data, model predictions, and post-processing) using standard fairness metrics. Our audit reveals systematic misallocation: younger, male, and international students are disproportionately flagged for support, even when many ultimately succeed, while older and female students with comparable dropout risk are under-identified. Post-processing amplifies these disparities by collapsing heterogeneous probabilities into percentile-based risk tiers. This work provides a replicable methodology for auditing institutional ML systems and shows how disparities emerge and compound across stages, highlighting the importance of evaluating construct validity alongside statistical fairness. It contributes one empirical thread to a broader program investigating algorithms, student data, and power in higher education.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.