Can VLMs Truly Forget? Benchmarking Training-Free Visual Concept Unlearning

April 3, 20262604.03114

Zhangyun Tan, Zeliang Zhang, Susan Liang, Yolo Yunlong Tang, Lisha Chen + 1 more

cs.CVcs.AI

TLDR

VLM-UnBench benchmarks training-free visual concept unlearning, revealing current prompt-based methods fail to truly erase concepts in VLMs.

Key contributions

Introduces VLM-UnBench, the first benchmark for training-free visual concept unlearning in VLMs.
Evaluates 8 settings and 13 VLM configs, separating true forgetting from instruction compliance.
Reveals realistic unlearning prompts are ineffective; only oracle conditions reduce forget accuracy.
Shows object and scene concepts are highly resistant, even in strong instruction-tuned models.

Why it matters

VLMs retain sensitive visual concepts, necessitating effective unlearning methods. This paper exposes a critical gap in training-free approaches, demonstrating that prompt-level suppression fails to achieve true visual concept erasure. It provides a robust benchmark to guide future research in this crucial area.

Original Abstract

VLMs trained on web-scale data retain sensitive and copyrighted visual concepts that deployment may require removing. Training-based unlearning methods share a structural flaw: fine-tuning on a narrow forget set degrades general capabilities before unlearning begins, making it impossible to attribute subsequent performance drops to the unlearning procedure itself. Training-free approaches sidestep this by suppressing concepts through prompts or system instructions, but no rigorous benchmark exists for evaluating them on visual tasks. We introduce VLM-UnBench, the first benchmark for training-free visual concept unlearning in VLMs. It covers four forgetting levels, 7 source datasets, and 11 concept axes, and pairs a three-level probe taxonomy with five evaluation conditions to separate genuine forgetting from instruction compliance. Across 8 evaluation settings and 13 VLM configurations, realistic unlearning prompts leave forget accuracy near the no-instruction baseline; meaningful reductions appear only under oracle conditions that disclose the target concept to the model. Object and scene concepts are the most resistant to suppression, and stronger instruction-tuned models remain capable despite explicit forget instructions. These results expose a clear gap between prompt-level suppression and true visual concept erasure.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers