ArXiv TLDR

CArtBench: Evaluating Vision-Language Models on Chinese Art Understanding, Interpretation, and Authenticity

🐦 Tweet
2604.11632

Xuefeng Wei, Zhixuan Wang, Xuan Zhou, Zhi Qu, Hongyao Li + 3 more

cs.CL

TLDR

CArtBench is a new benchmark evaluating Vision-Language Models on complex Chinese art understanding, interpretation, and authenticity tasks.

Key contributions

  • Introduces CArtBench, a museum-grounded benchmark for evaluating VLMs on Chinese art understanding.
  • Features subtasks like CURATORQA for evidence-grounded reasoning and CATALOGCAPTION for expert-style appreciation.
  • Includes REINTERPRET for defensible reinterpretation and CONNOISSEURPAIRS for diagnostic authenticity discrimination.

Why it matters

This paper addresses the gap in evaluating VLMs on nuanced art understanding. CArtBench provides a robust framework to test models beyond simple recognition, revealing current limitations in expert-level reasoning for Chinese art. It pushes the boundaries for future VLM development.

Original Abstract

We introduce CARTBENCH, a museum-grounded benchmark for evaluating vision-language models (VLMs) on Chinese artworks beyond short-form recognition and QA. CARTBENCH comprises four subtasks: CURATORQA for evidence-grounded recognition and reasoning, CATALOGCAPTION for structured four-section expert-style appreciation, REINTERPRET for defensible reinterpretation with expert ratings, and CONNOISSEURPAIRS for diagnostic authenticity discrimination under visually similar confounds. CARTBENCH is built by aligning image-bearing Palace Museum objects from Wikidata with authoritative catalog pages, spanning five art categories across multiple dynasties. Across nine representative VLMs, we find that high overall CURATORQA accuracy can mask sharp drops on hard evidence linking and style-to-period inference; long-form appreciation remains far from expert references; and authenticity-oriented diagnostic discrimination stays near chance, underscoring the difficulty of connoisseur-level reasoning for current models.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.