Unveiling the Backdoor Mechanism Hidden Behind Catastrophic Overfitting in Fast Adversarial Training

April 27, 20262604.24350

Mengnan Zhao, Lihe Zhang, Tianhang Zheng, Bo Wang, Baocai Yin

cs.LGcs.AIcs.CR

TLDR

This paper interprets catastrophic overfitting in fast adversarial training as a backdoor mechanism and proposes new mitigation strategies.

Key contributions

Interprets catastrophic overfitting (CO) in Fast Adversarial Training (FAT) as a backdoor mechanism.
Unifies CO, backdoor attacks, and unlearnable tasks under a common theoretical framework.
Proposes parameter recalibration (fine-tuning, linear probing, reinitialization) to mitigate CO.
Introduces a weight outlier suppression constraint to regulate abnormal model weight deviations.

Why it matters

This paper offers a novel, systematic explanation for catastrophic overfitting, a major challenge in adversarial training. By linking it to backdoor mechanisms, it provides a new theoretical lens and practical strategies to improve model robustness.

Original Abstract

Fast Adversarial Training (FAT) has attracted significant attention due to its efficiency in enhancing neural network robustness against adversarial attacks. However, FAT is prone to catastrophic overfitting (CO), wherein models overfit to the specific attack used during training and fail to generalize to others. While existing methods introduce diverse hypotheses and propose various strategies to mitigate CO, a systematic and intuitive explanation of CO remains absent. In this work, we innovatively interpret CO through the lens of backdoor. Through validations on pathway division, diverse feature predictions, and universal class distinguishable triggers in CO, we conceptualize CO as a weak trigger variant of unlearnable tasks, unifying CO, backdoor attacks, and unlearnable tasks under a common theoretical framework. Guided by this, we leverage several backdoor inspired strategies to mitigate CO: (i) Recalibrate CO affected model parameters using vanilla fine tuning, linear probing, or reinitialization-based techniques; (ii) Introduce a weight outlier suppression constraint to regulate abnormal deviations in model weights. Extensive experiments support our interpretation of CO and show the efficacy of the proposed mitigation strategies.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers