Fine-Tuning Regimes Define Distinct Continual Learning Problems

April 23, 20262604.21927

cs.LG

TLDR

Fine-tuning regimes, specifically trainable depth, are critical variables in continual learning, influencing method performance and comparative rankings.

Key contributions

Argues fine-tuning regimes, defined by trainable depth, are crucial evaluation variables in continual learning.
Formalizes adaptation regimes, showing trainable depth alters update signals for task fitting and knowledge preservation.
Finds that relative rankings of standard CL methods are not consistently preserved across different regimes.
Deeper adaptation regimes correlate with larger update magnitudes, increased forgetting, and a stronger relationship.

Why it matters

This paper reveals a critical flaw in current continual learning benchmarks by demonstrating that method comparisons depend heavily on the chosen fine-tuning regime. It highlights that existing evaluations might be misleading, urging the CL community to adopt regime-aware protocols for more reliable and robust assessments.

Original Abstract

Continual learning (CL) studies how models acquire tasks sequentially while retaining previously learned knowledge. Despite substantial progress in benchmarking CL methods, comparative evaluations typically keep the fine-tuning regime fixed. In this paper, we argue that the fine-tuning regime, defined by the trainable parameter subspace, is itself a key evaluation variable. We formalize adaptation regimes as projected optimization over fixed trainable subspaces, showing that changing the trainable depth alters the effective update signal through which both current task fitting and knowledge preservation operate. This analysis motivates the hypothesis that method comparisons need not be invariant across regimes. We test this hypothesis in task incremental CL, five trainable depth regimes, and four standard methods: online EWC, LwF, SI, and GEM. Across five benchmark datasets, namely MNIST, Fashion MNIST, KMNIST, QMNIST, and CIFAR-100, and across 11 task orders per dataset, we find that the relative ranking of methods is not consistently preserved across regimes. We further show that deeper adaptation regimes are associated with larger update magnitudes, higher forgetting, and a stronger relationship between the two. These results show that comparative conclusions in CL can depend strongly on the chosen fine-tuning regime, motivating regime-aware evaluation protocols that treat trainable depth as an explicit experimental factor.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers