Revisiting Fairness Impossibility with Endogenous Behavior
Elizabeth Maggie Penn, John W. Patty
TLDR
Considering endogenous behavior, this paper shows algorithmic fairness can reconcile error-rate balance and predictive parity, introducing new tradeoffs in consequences.
Key contributions
- Revisits classic fairness impossibilities by modeling strategic, endogenous behavioral responses to classification stakes.
- Demonstrates that the incompatibility between error-rate balance and predictive parity disappears in this strategic setting.
- Achieving this reconciliation requires treating groups differently in the consequences (stakes) attached to identical classifications.
- Proposes a two-stage design: standardize statistical performance, then adjust stakes to induce comparable behavior.
Why it matters
This paper redefines algorithmic fairness by incorporating endogenous human behavior and classification 'stakes.' It reveals new tradeoffs, showing equitable systems must consider human consequences as primary design variables, not just algorithm outputs.
Original Abstract
In many real-world settings, institutions can and do adjust the consequences attached to algorithmic classification decisions, such as the size of fines, sentence lengths, or benefit levels. We refer to these consequences as the stakes associated with classification. These stakes can give rise to behavioral responses to classification, as people adjust their actions in anticipation of how they will be classified. Much of the algorithmic fairness literature evaluates classification outcomes while holding behavior fixed, treating behavioral differences across groups as exogenous features of the environment. Under this assumption, the stakes of classification play no role in shaping outcomes. We revisit classic impossibility results in algorithmic fairness in a setting where people respond strategically to classification. We show that, in this environment, the well-known incompatibility between error-rate balance and predictive parity disappears, but only by potentially introducing a qualitatively different form of unequal treatment. Concretely, we construct a two-stage design in which a classifier first standardizes its statistical performance across groups, and then adjusts stakes so as to induce comparable patterns of behavior. This requires treating groups differently in the consequences attached to identical classification decisions. Our results demonstrate that fairness in strategic settings cannot be assessed solely by how algorithms map data into decisions. Rather, our analysis treats the human consequences of classification as primary design variables, introduces normative criteria governing their use, and shows that their interaction with statistical fairness criteria generates qualitatively new tradeoffs. Our aim is to make these tradeoffs precise and explicit.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.