CArtBench: Evaluating Vision-Language Models on Chinese Art Understanding, Interpretation, and Authenticity

April 13, 20262604.11632

Xuefeng Wei, Zhixuan Wang, Xuan Zhou, Zhi Qu, Hongyao Li + 3 more

cs.CL

TLDR

CArtBench is a new benchmark evaluating Vision-Language Models on complex Chinese art understanding, interpretation, and authenticity tasks.

Key contributions

Introduces CArtBench, a museum-grounded benchmark for evaluating VLMs on Chinese art understanding.
Features subtasks like CURATORQA for evidence-grounded reasoning and CATALOGCAPTION for expert-style appreciation.
Includes REINTERPRET for defensible reinterpretation and CONNOISSEURPAIRS for diagnostic authenticity discrimination.

Why it matters

This paper addresses the gap in evaluating VLMs on nuanced art understanding. CArtBench provides a robust framework to test models beyond simple recognition, revealing current limitations in expert-level reasoning for Chinese art. It pushes the boundaries for future VLM development.

Original Abstract

We introduce CARTBENCH, a museum-grounded benchmark for evaluating vision-language models (VLMs) on Chinese artworks beyond short-form recognition and QA. CARTBENCH comprises four subtasks: CURATORQA for evidence-grounded recognition and reasoning, CATALOGCAPTION for structured four-section expert-style appreciation, REINTERPRET for defensible reinterpretation with expert ratings, and CONNOISSEURPAIRS for diagnostic authenticity discrimination under visually similar confounds. CARTBENCH is built by aligning image-bearing Palace Museum objects from Wikidata with authoritative catalog pages, spanning five art categories across multiple dynasties. Across nine representative VLMs, we find that high overall CURATORQA accuracy can mask sharp drops on hard evidence linking and style-to-period inference; long-form appreciation remains far from expert references; and authenticity-oriented diagnostic discrimination stays near chance, underscoring the difficulty of connoisseur-level reasoning for current models.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers