TwoHamsters: Benchmarking Multi-Concept Compositional Unsafety in Text-to-Image Models
Chaoshuo Zhang, Yibo Liang, Mengke Tian, Chenhao Lin, Zhengyu Zhao + 4 more
TLDR
TwoHamsters benchmarks "Multi-Concept Compositional Unsafety" in T2I models, showing current defenses fail to prevent unsafe content from benign concept combinations.
Key contributions
- Formalizes "Multi-Concept Compositional Unsafety" (MCCU), a new vulnerability in text-to-image models.
- Introduces TwoHamsters, a 17.5k-prompt benchmark to evaluate MCCU risks in T2I models.
- Evaluates 10 SOTA T2I models and 16 defense mechanisms against MCCU using TwoHamsters.
- Reveals severe MCCU risks: FLUX achieves 99.52% unsafe generation, LLaVA-Guard only 41.06% recall.
Why it matters
This paper highlights a critical, overlooked safety gap in text-to-image models: the generation of unsafe content from seemingly benign concept combinations. By formalizing MCCU and providing a robust benchmark, it offers a crucial tool for developing more effective safety alignments. The findings underscore the urgent need for new defense mechanisms beyond current explicit content filters.
Original Abstract
Despite the remarkable synthesis capabilities of text-to-image (T2I) models, safeguarding them against content violations remains a persistent challenge. Existing safety alignments primarily focus on explicit malicious concepts, often overlooking the subtle yet critical risks of compositional semantics. To address this oversight, we identify and formalize a novel vulnerability: Multi-Concept Compositional Unsafety (MCCU), where unsafe semantics stem from the implicit associations of individually benign concepts. Based on this formulation, we introduce TwoHamsters, a comprehensive benchmark comprising 17.5k prompts curated to probe MCCU vulnerabilities. Through a rigorous evaluation of 10 state-of-the-art models and 16 defense mechanisms, our analysis yields 8 pivotal insights. In particular, we demonstrate that current T2I models and defense mechanisms face severe MCCU risks: on TwoHamsters, FLUX achieves an MCCU generation success rate of 99.52%, while LLaVA-Guard only attains a recall of 41.06%, highlighting a critical limitation of the current paradigm for managing hazardous compositional generation.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.