ArXiv TLDR

From Vulnerable Data Subjects to Vulnerabilizing Data Practices: Navigating the Protection Paradox in AI-Based Analyses of Platformized Lives

🐦 Tweet
2604.15990

Delfina S. Martinez Pandiani, Ella Streefkerk, Laurens Naudts, Paula Helm

cs.CYcs.AIcs.CVcs.HC

TLDR

This paper redefines vulnerability in AI data practices, showing how well-intentioned analyses can inadvertently create new forms of exposure and extraction.

Key contributions

  • Shifts focus from static data subject vulnerability to actively enacted vulnerability via data practices.
  • Introduces a "protection paradox" where data-driven protection efforts create new computational exposures.
  • Develops a reflexive ethics protocol for AI pipelines, addressing dataset design, operationalization, inference, and dissemination.
  • Identifies four cross-cutting vulnerabilizing factors: exposure, monetization, narrative fixing, and algorithmic optimization.

Why it matters

This paper is crucial for ethical AI development, especially in "AI for Social Good" contexts. It provides a practical framework to identify and mitigate unintended harm, ensuring data practices genuinely protect rather than precarize individuals. This helps researchers navigate complex ethical challenges in platformized data.

Original Abstract

This paper traces a conceptual shift from understanding vulnerability as a static, essentialized property of data subjects to examining how it is actively enacted through data practices. Unlike reflexive ethical frameworks focused on missing or counter-data, we address the condition of abundance inherent to platformized life-a context where a near inexhaustible mass of data points already exists, shifting the ethical challenge to the researcher's choices in operating upon this existing mass. We argue that the ethical integrity of data science depends not just on who is studied, but on how technical pipelines transform "vulnerable" individuals into data subjects whose vulnerability can be further precarized. We develop this argument through an AI for Social Good (AI4SG) case: a journalist's request to use computer vision to quantify child presence in monetized YouTube 'family vlogs' for regulatory advocacy. This case reveals a "protection paradox": how data-driven efforts to protect vulnerable subjects can inadvertently impose new forms of computational exposure, reductionism, and extraction. Using this request as a point of departure, we perform a methodological deconstruction of the AI pipeline to show how granular technical decisions are ethically constitutive. We contribute a reflexive ethics protocol that translates these insights into a reflexive roadmap for research ethics surrounding platformized data subjects. Organized around four critical junctures-dataset design, operationalization, inference, and dissemination-the protocol identifies technical questions and ethical tensions where well-intentioned work can slide into renewed extraction or exposure. For every decision point, the protocol offers specific prompts to navigate four cross-cutting vulnerabilizing factors: exposure, monetization, narrative fixing, and algorithmic optimization. Rather than uncritically...

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.