R-CoV: Region-Aware Chain-of-Verification for Alleviating Object Hallucinations in LVLMs

April 22, 20262604.20696

Jiahao Xie, Alessio Tonioni, Nathalie Rauschmayr, Federico Tombari, Bernt Schiele

cs.CV

TLDR

R-CoV is a region-aware chain-of-verification method that significantly alleviates object hallucinations in large vision-language models post-hoc.

Key contributions

Proposes R-CoV, a visual chain-of-verification to reduce object hallucinations in LVLMs.
Leverages region-level processing from LVLMs themselves for detection and alleviation.
Consists of six steps: response, entity, coordinate, description, verification, final response.
Integrates seamlessly into various LVLMs in a training-free manner without external models.

Why it matters

Object hallucinations are a major challenge for LVLMs, hindering their reliability and trustworthiness. R-CoV offers a simple, training-free solution to improve their accuracy. By addressing this, it makes LVLMs more robust and dependable for real-world applications.

Original Abstract

Large vision-language models (LVLMs) have demonstrated impressive performance in various multimodal understanding and reasoning tasks. However, they still struggle with object hallucinations, i.e., the claim of nonexistent objects in the visual input. To address this challenge, we propose Region-aware Chain-of-Verification (R-CoV), a visual chain-of-verification method to alleviate object hallucinations in LVLMs in a post-hoc manner. Motivated by how humans comprehend intricate visual information -- often focusing on specific image regions or details within a given sample -- we elicit such region-level processing from LVLMs themselves and use it as a chaining cue to detect and alleviate their own object hallucinations. Specifically, our R-CoV consists of six steps: initial response generation, entity extraction, coordinate generation, region description, verification execution, and final response generation. As a simple yet effective method, R-CoV can be seamlessly integrated into various LVLMs in a training-free manner and without relying on external detection models. Extensive experiments on several widely used hallucination benchmarks across multiple LVLMs demonstrate that R-CoV can significantly alleviate object hallucinations in LVLMs. Project page: https://github.com/Jiahao000/R-CoV.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers