FashionStylist: An Expert Knowledge-enhanced Multimodal Dataset for Fashion Understanding
Kaidong Feng, Zhuoxuan Huang, Huizhong Guo, Yuting Jin, Xinyu Chen + 5 more
TLDR
FashionStylist is a new expert-annotated multimodal dataset designed for holistic fashion understanding, supporting tasks like grounding, completion, and evaluation.
Key contributions
- Introduces FashionStylist, an expert-annotated multimodal dataset for holistic fashion understanding.
- Supports three key tasks: outfit-to-item grounding, completion, and expert-level evaluation.
- Provides professionally grounded annotations at both item and outfit levels.
Why it matters
Existing fashion datasets are fragmented and lack expert-level reasoning for holistic outfit understanding. FashionStylist fills this gap by providing a unified, expert-annotated benchmark. It significantly advances MLLM-based fashion systems by improving grounding, completion, and semantic evaluation capabilities.
Original Abstract
Fashion understanding requires both visual perception and expert-level reasoning about style, occasion, compatibility, and outfit rationale. However, existing fashion datasets remain fragmented and task-specific, often focusing on item attributes, outfit co-occurrence, or weak textual supervision, and thus provide limited support for holistic outfit understanding. In this paper, we introduce FashionStylist, an expert-annotated benchmark for holistic and expert-level fashion understanding. Constructed through a dedicated fashion-expert annotation pipeline, FashionStylist provides professionally grounded annotations at both the item and outfit levels. It supports three representative tasks: outfit-to-item grounding, outfit completion, and outfit evaluation. These tasks cover realistic item recovery from complex outfits with layering and accessories, compatibility-aware composition beyond co-occurrence matching, and expert-level assessment of style, season, occasion, and overall coherence. Experimental results show that FashionStylist serves not only as a unified benchmark for multiple fashion tasks, but also as an effective training resource for improving grounding, completion, and outfit-level semantic evaluation in MLLM-based fashion systems.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.