PrivacySIM: Evaluating LLM Simulation of User Privacy Behavior
James Flemings, Murali Annavaram
TLDR
PrivacySIM evaluates LLMs' ability to simulate individual privacy decisions, finding persona conditioning improves accuracy but models still struggle.
Key contributions
- Introduces PrivacySIM, an evaluation suite benchmarking LLM privacy simulation against 1,000 real users.
- Finds persona conditioning consistently improves LLM accuracy, but the strongest model only reaches 40.4%.
- Reveals that stated privacy attitudes often diverge from actual behavior, making them poor predictors.
- Identifies users with high AI experience but low stated privacy attitudes as the most challenging to simulate.
Why it matters
This paper introduces a crucial benchmark, PrivacySIM, to assess how well LLMs can mimic individual privacy choices. It highlights current limitations, showing even with persona data, LLMs are far from accurate. This work is vital for developing more trustworthy and user-centric AI systems.
Original Abstract
Large language models (LLMs) are increasingly used to simulate human behavior, but their ability to simulate $individual$ privacy decisions is not well understood. In this paper, we address the problem of evaluating whether a core set of user persona attributes can drive LLMs to simulate individual-level privacy behavior. We introduce PrivacySIM, an evaluation suite that benchmarks LLM simulation of user privacy behavior against the ground-truth responses of 1,000 users. These users are drawn from five published user studies on privacy spanning LLM healthcare consultations, conversational agents, and chatbots. Drawing on these user studies, we hypothesize three persona facets as plausible predictors of privacy decision-making: demographics, previous experiences, and stated privacy attitudes. We condition nine frontier LLMs on subsets of these three facets and measure how often each model's response to a data-sharing scenario matches the user's actual response. Our findings show that (1) privacy persona conditioning consistently improves simulation quality over no-persona conditioning, but even the strongest model (40.4\% accuracy) remains far from faithfully simulating individual privacy decisions. (2) A user's stated privacy attitudes alone may not be the best predictor because they often diverge from the user's actual privacy behavior. (3) Users with high AI/chatbot experience but low stated privacy attitudes are the most challenging to simulate. PrivacySIM is a first step toward understanding and improving the capabilities of LLMs to simulate user privacy decisions. We release PrivacySIM to enable further evaluation of LLM privacy simulation.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.