Profiling for Pennies: Unveiling the Privacy Iceberg of LLM Agents

May 7, 20262605.06232

Jiahao Chen, Qi Zhang, Ruixiao Lin, Chunyi Zhou, Tianyu Du + 5 more

cs.CR

TLDR

LLM agents can create detailed personal profiles cheaply and quickly, exposing significant privacy risks due to platform failures and lack of awareness.

Key contributions

LLM agents enable cheap, automated, and in-depth personal profiling, posing significant privacy risks.
Reveals a gap between public privacy concerns and LLM platform's technical/policy responses.
Proposes PrivacyIceberg, a three-tier model for categorizing real-world human privacy risks.
Developed IcebergExplorer, reconstructing high-fidelity profiles with >90% accuracy in <10 mins for <$3.

Why it matters

This paper highlights a critical, underexplored privacy threat from LLM agents: automated personal profiling. It demonstrates how easily detailed profiles can be created and exposes the failure of current platforms to address these risks, urging immediate action from all stakeholders.

Original Abstract

Large Language Models (LLMs) have revolutionized how information are collected, aggregated, and reasoned. However, this enables a novel and accessible vector of privacy intrusion: the automated and in-depth personal profiling; this engenders a chilling effect of "peepers everywhere". Existing research primarily unfolds from the training pipeline of LLM, emphasizing the exposure of Personally Identifiable Information (PII) through memorization, while privacy studies from a human-centric perspective remain underexplored. To fill this void, we empirically investigate privacy perception in the real world through the lens of human awareness and the practices of LLM-integrated platforms, revealing a significant dissonance: platforms fail to technically or policy-wise address public privacy concerns. To facilitate a systematic and quantifiable study of privacy risk, we propose the PrivacyIceberg, which categorizes real-world human privacy risks into three tiers: explicitly searched, contextually inferred, and deeply aggregated, based on the sophistication of LLM exploitation. We developed IcebergExplorer to audit privacy exposure, utilizing minimal PII as a search seed to reconstruct high-fidelity profiles, achieving over 90% factual accuracy within 10 minutes at a cost under $3, for real-world scenarios. Additionally, we identify six root causes contributing to such privacy disclosures and propose multi-stakeholder countermeasures for LLM vendors, individuals, and data publishers.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers