Intersectional Fairness in Large Language Models
Chaima Boufaied, Ronnie De Souza Santos, Ann Barcomb
TLDR
LLMs exhibit intersectional fairness issues, with accuracy influenced by stereotype alignment, especially in race-gender contexts, requiring comprehensive evaluation.
Key contributions
- Systematically evaluated intersectional fairness in six LLMs across ambiguous and disambiguated contexts.
- LLM accuracy is influenced by stereotype alignment, especially in race-gender intersections.
- Subgroup fairness metrics reveal uneven outcomes across intersectional groups despite low disparity.
- LLM responses show inconsistency across repeated runs, including stereotype-aligned answers.
Why it matters
This paper reveals that LLM competence can mask deep-seated biases tied to stereotypes, particularly at intersectional levels. It highlights the critical need for robust evaluation beyond simple accuracy to ensure truly fair and reliable AI systems.
Original Abstract
Large Language Models (LLMs) are increasingly deployed in socially sensitive settings, raising concerns about fairness and biases, particularly across intersectional demographic attributes. In this paper, we systematically evaluate intersectional fairness in six LLMs using ambiguous and disambiguated contexts from two benchmark datasets. We assess LLM behavior using bias scores, subgroup fairness metrics, accuracy, and consistency through multi-run analysis across contexts and negative and non-negative question polarities. Our results show that while modern LLMs generally perform well in ambiguous contexts, this limits the informativeness of fairness metrics due to sparse non-unknown predictions. In disambiguated contexts, LLM accuracy is influenced by stereotype alignment, with models being more accurate when the correct answer reinforces a stereotype than when it contradicts it. This pattern is especially pronounced in race-gender intersections, where directional bias toward stereotypes is stronger. Subgroup fairness metrics further indicate that, despite low observed disparity in some cases, outcome distributions remain uneven across intersectional groups. Across repeated runs, responses also vary in consistency, including stereotype-aligned responses. Overall, our findings show that apparent model competence is partly associated with stereotype-consistent cues, and no evaluated LLM achieves consistently reliable or fair behavior across intersectional settings. These findings highlight the need for evaluation beyond accuracy, emphasizing the importance of combining bias, subgroup fairness, and consistency metrics across intersectional groups, contexts, and repeated runs.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.