ArXiv TLDR

Mapping how LLMs debate societal issues when shadowing human personality traits, sociodemographics and social media behavior

🐦 Tweet
2604.27624

Ali Aghazadeh Ardebili, Massimo Stella

cs.CLcs.AIcs.CYcs.HCcs.LG

TLDR

This paper introduces Cognitive Digital Shadows (CDS), a large synthetic corpus for analyzing how LLMs debate societal issues when shadowing human traits.

Key contributions

  • Introduces Cognitive Digital Shadows (CDS), a 190,000-record synthetic corpus for LLM discourse analysis.
  • LLM responses generated by 19 models, shadowing human personas or AI assistants on 4 controversial topics.
  • Persona data links LLM prompts, language, stances, and reasoning via 17 sociodemographic attributes.
  • User-friendly platform enables interactive comparisons of emotional and semantic framing across personas.

Why it matters

LLMs significantly influence social discourse, making it crucial to understand how their outputs vary with social context. This paper offers a unique dataset and framework to investigate LLM behavior. It enables critical audits of LLM bias, sensitivity, and alignment, fostering more responsible AI development.

Original Abstract

Large Language Models (LLMs) can strongly shape social discourse, yet datasets investigating how LLM outputs vary across controlled social and contextual prompting remain sparse. Cognitive Digital Shadows (CDS) is a 190,000-record synthetic corpus supporting analyses of LLM-generated discourse. Each CDS record is generated by one of 19 LLMs, prompted to shadow either a human persona or an AI-assistant role. CDS contains LLM responses on 4 controversial societal topics: vaccines/healthcare, social media disinformation, the gender gap in science, and STEM stereotypes. Persona-conditioned records encode 17 sociodemographic and psychological attributes, providing data linking LLMs' prompts, language, stances and reasoning. Texts are validated for topic anchoring and can support emotional analyses via interpretable NLP (e.g. textual forma mentis networks). CDS is enriched by a pooling platform with user-friendly dashboards, enabling easy, interactive group-level comparisons of emotional and semantic framing across personas, topics and models. The CDS prompting framework supports future audits of LLMs' bias, social sensitivity and alignment.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.