Learning More from Less: Exploiting Counterfactuals for Data-Efficient Chart Understanding
Jianzhu Bao, Haozhen Zhang, Kuicai Dong, Bozhi Wu, Sarthak Ketanbhai Modi + 3 more
TLDR
ChartCF improves VLM chart understanding data-efficiently by leveraging counterfactuals through novel data synthesis, selection, and multimodal optimization.
Key contributions
- Proposes a counterfactual data synthesis pipeline using code modification to create diverse training examples.
- Implements a chart similarity-based data selection strategy to filter out overly difficult samples for efficiency.
- Utilizes multimodal preference optimization across both textual and visual modalities for robust learning.
- Achieves superior or comparable performance to existing VLMs with significantly less training data.
Why it matters
This paper addresses data inefficiency in training VLMs for chart understanding by exploiting their programmatic nature. ChartCF enhances models' sensitivity to subtle visual changes, leading to more robust and data-efficient learning. This is crucial for developing high-performing chart analysis tools with reduced data needs.
Original Abstract
Vision-Language Models (VLMs) have demonstrated remarkable progress in chart understanding, largely driven by supervised fine-tuning (SFT) on increasingly large synthetic datasets. However, scaling SFT data alone is inefficient and overlooks a key property of charts: charts are programmatically generated visual artifacts, where small, code-controlled visual changes can induce drastic shifts in semantics and correct answers. Learning this counterfactual sensitivity requires VLMs to discriminate fine-grained visual differences, yet standard SFT treats training instances independently and provides limited supervision to enforce this behavior. To address this, we introduce ChartCF, a data-efficient training framework designed to enhance counterfactual sensitivity. ChartCF consists of: (1) a counterfactual data synthesis pipeline via code modification, (2) a chart similarity-based data selection strategy that filters overly difficult samples for improved training efficiency, and (3) multimodal preference optimization across both textual and visual modalities. Experiments on five benchmarks show that ChartCF achieves superior or comparable performance to strong chart-specific VLMs while using significantly less training data.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.