When Can Digital Personas Reliably Approximate Human Survey Findings?
Mumin Jia, Yilin Chen, Divya Sharma, Jairo Diaz-Rodriguez
TLDR
This paper evaluates when LLM-powered digital personas can reliably substitute human survey respondents, finding they align distributionally but struggle with individual predictions.
Key contributions
- Evaluated LLM personas against human survey data (LISS panel) across architectures, LLMs, and tasks.
- Personas align with human response distributions for stable attributes but fail individual prediction.
- Performance depends on human response structure (low variability) more than LLM choice.
- Retrieval-augmented architectures provide the clearest gains in persona reliability.
Why it matters
This research provides crucial guidance for survey researchers considering LLM-powered personas. It clarifies their strengths in capturing aggregate trends and limitations for individual-level insights. Understanding these boundaries is essential for designing reliable and ethical survey methodologies in the age of AI.
Original Abstract
Digital personas powered by Large Language Models (LLMs) are increasingly proposed as substitutes for human survey respondents, yet it remains unclear when they can reliably approximate human survey findings. We answer this question using the LISS panel, constructing personas from respondents' background variables and pre-2023 survey histories, then testing them against the same respondents' held-out post-cutoff answers. Across four persona architectures, three LLMs, and two prediction tasks, we assess performance at the question, respondent, distributional, equity, and clustering levels. Digital personas improve alignment with human response distributions, especially in domains tied to stable attributes and values, but remain limited for individual prediction and fail to recover multivariate respondent structure. Retrieval-augmented architectures provide the clearest gains, but performance depends more on human response structure than on model choice: personas perform best for low-variability questions and common respondent patterns, and worst for subjective, heterogeneous, or rare responses. Our results provide practical guidance on when digital personas could be appropriate for survey research and when human validation remains necessary.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.