Pretraining Exposure Explains Popularity Judgments in Large Language Models

May 12, 20262605.12382

Jamshid Mozafari, Bhawna Piryani, Adam Jatowt

cs.CL

TLDR

LLMs' popularity judgments are primarily driven by pretraining data exposure, not external popularity, as shown by analyzing OLMo and Dolma.

Key contributions

First direct, large-scale analysis of popularity bias using fully observable pretraining data (OLMo/Dolma).
Computed entity-level exposure statistics across 7.4 trillion tokens for 2,000 diverse entities.
Found LLM popularity judgments align more closely with pretraining exposure than external popularity signals.
This alignment is stronger for larger models and persists in the long tail where Wikipedia is unreliable.

Why it matters

This paper clarifies the source of popularity bias in LLMs, showing it's mainly due to pretraining data exposure. Understanding this helps in mitigating biases and developing more robust and fair AI systems. It provides crucial insights for future LLM development.

Original Abstract

Large language models (LLMs) exhibit systematic preferences for well-known entities, a phenomenon often attributed to popularity bias. However, the extent to which these preferences reflect real-world popularity versus statistical exposure during pretraining remains unclear, largely due to the inaccessibility of most training corpora. We provide the first direct, large-scale analysis of popularity bias grounded in fully observable pretraining data. Leveraging the open OLMo models and their complete pretraining corpus, Dolma, we compute precise entity-level exposure statistics across 7.4 trillion tokens. We analyze 2,000 entities spanning five types (Person, Location, Organization, Art, Product) and compare pretraining exposure against Wikipedia pageviews and two elicited LLM popularity signals: direct scalar estimation and pairwise comparison. Our results show that pretraining exposure strongly correlates with Wikipedia popularity, validating exposure as a meaningful proxy for real-world salience during the training period. More importantly, we find that LLM popularity judgments align more closely with exposure than with Wikipedia, especially when elicited via pairwise comparisons. This alignment is strongest for larger models and persists in the long tail, where Wikipedia popularity becomes unreliable. Overall, our findings demonstrate that popularity priors in LLMs are primarily shaped by pretraining statistics rather than external popularity signals, offering concrete evidence that data exposure plays a central role in driving popularity bias.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers