Does it Really Count? Assessing Semantic Grounding in Text-Guided Class-Agnostic Counting

May 4, 20262605.02752

Giacomo Pacini, Luca Ciampi, Nicola Messina, Nicola Tonellotto, Giuseppe Amato + 1 more

cs.CV

TLDR

New evaluation shows state-of-the-art text-guided counting models struggle with semantic grounding, often misinterpreting prompts and counting wrong objects.

Key contributions

SOTA text-guided counting models often fail to correctly ground prompts, leading to spurious counts.
Introduces PrACo++, a novel test suite with negative-label and distractor tests and new metrics.
Presents MUCCA, a new multi-category dataset for robust class-agnostic counting evaluation.
Extensive evaluation of 10 SOTA methods reveals significant weaknesses in prompt understanding.

Why it matters

This paper exposes a critical flaw in text-guided counting models: their inability to correctly ground prompts, leading to unreliable counts. It provides a robust new evaluation framework and dataset, essential for developing more semantically grounded and trustworthy counting systems.

Original Abstract

Open-world text-guided class-agnostic counting (CAC) has emerged as a flexible paradigm for counting arbitrary object classes by using natural language prompts. However, current evaluation protocols primarily focus on standard counting errors within single-category images, overlooking a fundamental requirement: the ability to correctly ground the textual prompt in the visual scene. In this paper, we show that several state-of-the-art CAC models often struggle to determine which object class should be counted based on the given prompt, revealing a misalignment between textual semantics and visual object representations. This limitation leads to spurious counting responses and reduced reliability in real-world scenarios. To systematically address these limitations, we propose a new evaluation framework focused on model robustness and trustworthiness. Our contribution is two-fold: (i) we introduce PrACo++ (Prompt-Aware Counting++), a novel test suite featuring two dedicated evaluation protocols -- the negative-label test and the distractor test -- paired with new specialized metrics; and (ii) we present the MUCCA (MUlti-Category Class-Agnostic counting) evaluation dataset, a new collection of real-world images featuring multiple annotated object categories per scene, unlike existing CAC benchmarks that typically include a single category per image. Our extensive experimental evaluation of 10 state-of-the-art methods shows that, despite strong performance under standard counting metrics, current models exhibit significant weaknesses in understanding and grounding object class descriptions. Finally, we provide a quantitative analysis of how semantic similarity between prompts influences these failures. Overall, our results underscore the need for more semantically grounded architectures and offer a reliable framework for future assessment in open-world text-guided CAC methods.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers