Representational Harms in LLM-Generated Narratives Against Global Majority Nationalities
Ilana Nguyen, Harini Suresh, Thema Monroe-White, Evan Shieh
TLDR
This paper reveals how LLMs perpetuate harmful stereotypes, erasure, and one-dimensional portrayals of Global Majority nationalities in generated narratives.
Key contributions
- LLMs show persistent representational harms against Global Majority nationalities in generated narratives.
- Minoritized identities are underrepresented in neutral stories and 50x more likely to be in subordinated roles.
- Harms are amplified by US nationality cues and persist even when non-US national identities are prompted.
- Advocates for centering Global Majority perspectives to address cultural harms and challenge US-based LLM adoption.
Why it matters
This paper is crucial as it uncovers how widely-used LLMs encode and perpetuate significant representational harms against Global Majority nationalities. It highlights the dangers of uncritical adoption of US-centric LLMs, especially for sensitive applications, and advocates for methodologies that prioritize diverse cultural perspectives.
Original Abstract
Large language models (LLMs) are increasingly used for text generation tasks from everyday use to high-stakes enterprise and government applications, including simulated interviews with asylum seekers. While many works highlight the new potential applications of LLMs, there are risks of LLMs encoding and perpetuating harmful biases about non-dominant communities across the globe. To better evaluate and mitigate such harms, more research examining how LLMs portray diverse individuals is needed. In this work, we study how national origin identities are portrayed by widely-adopted LLMs in response to open-ended narrative generation prompts. Our findings demonstrate the presence of persistent representational harms by national origin, including harmful stereotypes, erasure, and one-dimensional portrayals of Global Majority identities. Minoritized national identities are simultaneously underrepresented in power-neutral stories and overrepresented in subordinated character portrayals, which are over fifty times more likely to appear than dominant portrayals. The degree of harm is amplified when US nationality cues (e.g., ``American'') are present in input prompts. Notably, we find that the harms we identify cannot be explained away via sycophancy, as US-centric biases persist even when replacing US nationality cues with non-US national identities in the prompts. Based on our findings, we call for further exploration of cultural harms in LLMs through methodologies that center Global Majority perspectives and challenge the uncritical adoption of US-based LLMs for the classification, surveillance, and misrepresentation of the majority of our planet.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.