ArXiv TLDR

The Algorithmic Caricature: Auditing LLM-Generated Political Discourse Across Crisis Events

🐦 Tweet
2605.12452

Gunjan, Sidahmed Benabderrahmane, Talal Rahwan

cs.CLcs.AIcs.CY

TLDR

This paper audits LLM-generated political discourse during crises, finding it lacks population realism compared to observed online content.

Key contributions

  • Compares LLM-generated political discourse with observed human discourse across 9 diverse crisis events.
  • Identifies synthetic discourse as more negative, structurally regular, and lexically abstract than human text.
  • Reveals realism gaps are larger for fast-moving, decentralized crises.
  • Introduces the "Caricature Gap" measure and a new population-level auditing framework.

Why it matters

This research highlights a critical limitation of LLMs in generating realistic political discourse, especially during crises. It introduces a novel auditing method to assess the social realism of AI-generated text, crucial for understanding and mitigating potential societal impacts.

Original Abstract

Large Language Models (LLMs) can generate fluent political text at scale, raising concerns about synthetic discourse during crises and social conflict. Existing AI-text detection often focuses on sentence-level cues such as perplexity, burstiness, or token irregularities, but these signals may weaken as generative systems improve. We instead adopt a Computational Social Science perspective and ask whether synthetic political discourse behaves like an observed online population. We construct a paired corpus of 1,789,406 posts across nine crisis events: COVID-19, the Jan. 6 Capitol attack, the 2020 and 2024 U.S. elections, Dobbs/Roe v. Wade, the 2020 BLM protests, U.S. midterms, the Utah shooting, and the U.S.-Iran war. For each event, we compare observed discourse from social platforms with synthetic discourse generated for the same context. We evaluate four dimensions: emotional intensity, structural regularity, lexical-ideological framing, and cross-event dependency, using mean gaps and dispersion evidence. Across events, synthetic discourse is fluent but population-level unrealistic. It is generally more negative and less dispersed in sentiment, structurally more regular, and lexically more abstract than observed discourse. Observed discourse instead shows broader emotional variation, longer-tailed structural distributions, and more context-specific, colloquial lexical markers. These differences are event-dependent: larger for fast-moving, decentralized crises and smaller for formal or institutionally mediated events. We summarize them with a simple event-level measure, the Caricature Gap. Our findings suggest that the main limitation of synthetic political discourse is not grammar or fluency, but reduced population realism. Population-level auditing complements traditional text-detection and provides a CSS framework for evaluating the social realism of generated discourse.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.