ArXiv TLDR

Repurposing Image Diffusion Models for Adversarial Synthetic Structured Data: A Case Study of Ground Truth Drift

🐦 Tweet
2605.00788

Adam Arthur, Christopher Schwartz

cs.CR

TLDR

This paper explores repurposing image diffusion models like Stable Diffusion to generate adversarial synthetic structured data, potentially causing ground truth drift.

Key contributions

  • Repurposes Stable Diffusion U-Net for tabular data by reshaping rows into pseudo-images.
  • Tests various feature layouts, leveraging the U-Net's inductive bias for spatial locality.
  • Distinguishes statistical from perceptual realism for synthetic content.
  • Introduces "synthetic evidence" for machine-consumed AI-generated material, distinct from media.

Why it matters

This paper reveals a novel threat: repurposing image diffusion models to generate adversarial synthetic structured data. It introduces "synthetic evidence" and "ground truth drift," showing how AI-generated data can silently corrupt data pipelines. This is crucial for understanding new attack vectors and securing data integrity.

Original Abstract

Public image diffusion models are now powerful enough that an attacker without the resources to train a tabular-specific generator may repurpose one off the shelf. This study tests that possibility directly. An unmodified Stable Diffusion U-Net is applied to the UCI Adult Income dataset by reshaping each row into a small single-channel pseudo-image. The architecture's inductive bias toward spatial locality makes feature placement a design variable, and several layouts are tested. However, this is only the beginning of the story, as this paper also draws two philosophical distinctions. One separates statistical from perceptual realism: whether synthetic content holds up to a machine's correlation audits or a human's sensory inspection. The other introduces synthetic evidence as a category alongside synthetic media: AI-generated material whose consumer is a machine in a closed evidentiary pipeline rather than a person in an open information system. An attacker succeeds with synthetic evidence by thinking like the machine that will receive it. And the more the attacker succeeds, the more they can induce ground truth drift: the silent reclassification of AI-generated outputs as authentic when reused in pipelines that do not interrogate their provenance.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.