Learning More from Less: Exploiting Counterfactuals for Data-Efficient Chart Understanding

May 11, 20262605.10855

Jianzhu Bao, Haozhen Zhang, Kuicai Dong, Bozhi Wu, Sarthak Ketanbhai Modi + 3 more

cs.CL

TLDR

ChartCF improves VLM chart understanding data-efficiently by leveraging counterfactuals through novel data synthesis, selection, and multimodal optimization.

Key contributions

Proposes a counterfactual data synthesis pipeline using code modification to create diverse training examples.
Implements a chart similarity-based data selection strategy to filter out overly difficult samples for efficiency.
Utilizes multimodal preference optimization across both textual and visual modalities for robust learning.
Achieves superior or comparable performance to existing VLMs with significantly less training data.

Why it matters

This paper addresses data inefficiency in training VLMs for chart understanding by exploiting their programmatic nature. ChartCF enhances models' sensitivity to subtle visual changes, leading to more robust and data-efficient learning. This is crucial for developing high-performing chart analysis tools with reduced data needs.

Original Abstract

Vision-Language Models (VLMs) have demonstrated remarkable progress in chart understanding, largely driven by supervised fine-tuning (SFT) on increasingly large synthetic datasets. However, scaling SFT data alone is inefficient and overlooks a key property of charts: charts are programmatically generated visual artifacts, where small, code-controlled visual changes can induce drastic shifts in semantics and correct answers. Learning this counterfactual sensitivity requires VLMs to discriminate fine-grained visual differences, yet standard SFT treats training instances independently and provides limited supervision to enforce this behavior. To address this, we introduce ChartCF, a data-efficient training framework designed to enhance counterfactual sensitivity. ChartCF consists of: (1) a counterfactual data synthesis pipeline via code modification, (2) a chart similarity-based data selection strategy that filters overly difficult samples for improved training efficiency, and (3) multimodal preference optimization across both textual and visual modalities. Experiments on five benchmarks show that ChartCF achieves superior or comparable performance to strong chart-specific VLMs while using significantly less training data.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers