Generate "Normal", Edit Poisoned: Branding Injection via Hint Embedding in Image Editing

May 11, 20262605.10600

Desen Sun, Jason Hon, Howe Wang, Saarth Rajan, Meng Xu + 1 more

cs.CR

TLDR

This paper reveals a new vulnerability where hidden branding in images can be re-rendered by generative AI models during editing, even without explicit prompts.

Key contributions

Discovers a vulnerability: hidden branding in images can be re-rendered by generative AI.
Shows models re-render nearly invisible hints onto semantically related objects during editing.
Evaluates two attack types: phishing (44.4% success) and model poisoning (32.2% success).
Develops a mitigation solution with high success rates (87.4% and 92.3%) against both attacks.

Why it matters

This paper uncovers a critical vulnerability in generative AI image editing, allowing hidden branding to be injected and re-rendered. This is vital for preventing stealthy content attacks and building more secure AI image generation systems.

Original Abstract

With the rapid advancement of generative AI, users increasingly rely on image-generation models for image design and creation. To achieve faithful outputs, users typically engage in multi-turn interactions during image refinement: a text-to-image generation phase followed by a text-guided image-to-image editing phase. In this paper, we investigate a novel security vulnerability associated with such a workflow. Our key insight is that a nearly invisible hint, like branding information (e.g., a logo), embedded in an input image can be recognized by downstream generative models and subsequently re-rendered onto semantically related objects, even when the user prompt does not explicitly mention it. This form of hidden payload injection makes the attack stealthy. We study two realistic attack scenarios. The first is a phishing-based setting, in which an attacker controls an online image generation service and injects hidden content into generated images before they are returned to users. The second is a poison-based setting, where an attacker distributes a compromised text-to-image diffusion model whose output contains hidden content. We evaluate both attacks using six injected payloads, including well-known logos and customized designs, and demonstrate that the two attacks can achieve success rates of 44.4% and 32.2% on average, respectively, while ensuring the injected logos are visually imperceptible. We also develop a mitigation solution that achieves an average success rate of 87.4% and 92.3% against the phishing-based and poison-based attacks, respectively.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers