SEAL: Semantic-aware Single-image Sticker Personalization with a Large-scale Sticker-tag Dataset
Changhyun Roh, Yonghyun Jeong, Jonghyun Lee, Chanho Eom, Jihyong Oh
TLDR
SEAL is a plug-and-play module that improves single-image sticker personalization in diffusion models by preventing overfitting and enhancing contextual control.
Key contributions
- Introduces SEAL, a plug-and-play module for single-image sticker personalization in diffusion models.
- SEAL employs Semantic-guided Spatial Attention Loss, Split-merge Token Strategy, and Structure-aware Layer Restriction.
- Prevents visual entanglement and structural rigidity, improving identity preservation and contextual controllability.
- Presents StickerBench, a large-scale dataset with structured tags for systematic sticker personalization evaluation.
Why it matters
This paper addresses key challenges in single-image personalization for stickers, a growing application. By preventing overfitting and enhancing control, SEAL makes diffusion models more practical for generating custom stickers. The new StickerBench dataset also provides a crucial resource for future research and standardized evaluation in this domain.
Original Abstract
Synthesizing a target concept from a single reference image is challenging in diffusion-based personalized text-to-image generation, particularly for sticker personalization where prompts often require explicit attribute edits. With only one reference, test-time fine-tuning (TTF) methods tend to overfit, producing \textit{visual entanglement}, where background artifacts are absorbed into the learned concept, and \textit{structural rigidity}, where the model memorizes reference-specific spatial configurations and loses contextual controllability. To address these issues, we introduce \textbf{SE}mantic-aware single-image sticker person\textbf{AL}ization (\textbf{SEAL}), a plug-and-play, architecture-agnostic adaptation module that integrates into existing personalization pipelines without modifying their U-Net-based diffusion backbones. SEAL applies three components during embedding adaptation: (1) a Semantic-guided Spatial Attention Loss, (2) a Split-merge Token Strategy, and (3) Structure-aware Layer Restriction. To support sticker-domain personalization with attribute-level control, we present StickerBench, a large-scale sticker image dataset with structured tags under a six-attribute schema (Appearance, Emotion, Action, Camera Composition, Style, Background). These annotations provide a consistent interface for varying context while keeping target identity fixed, enabling systematic evaluation of identity disentanglement and contextual controllability. Experiments show that SEAL consistently improves identity preservation while maintaining contextual controllability, highlighting the importance of explicit spatial and structural constraints during test-time adaptation. The code, StickerBench, and project page will be publicly released.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.