ArXiv TLDR

Parameter Importance is Not Static: Evolving Parameter Isolation for Supervised Fine-Tuning

🐦 Tweet
2604.14010

Zekai Lin, Chao Xue, Di Liang, Xingsheng Han, Peiyang Liu + 6 more

cs.LGcs.CL

TLDR

EPI dynamically isolates critical parameters during SFT, reducing interference and forgetting by adapting to evolving parameter importance.

Key contributions

  • Demonstrates that parameter importance in SFT is dynamic, not static, and drifts over training.
  • Introduces Evolving Parameter Isolation (EPI) to adaptively update parameter isolation masks.
  • EPI uses online gradient signals to protect emerging critical parameters and release outdated ones.
  • Significantly reduces task interference and catastrophic forgetting, improving generalization.

Why it matters

This paper addresses a key limitation in SFT by showing parameter importance isn't fixed. EPI offers a dynamic solution, improving LLM fine-tuning stability and performance. It's crucial for developing more robust and adaptable multi-task models.

Original Abstract

Supervised Fine-Tuning (SFT) of large language models often suffers from task interference and catastrophic forgetting. Recent approaches alleviate this issue by isolating task-critical parameters during training. However, these methods represent a static solution to a dynamic problem, assuming that parameter importance remains fixed once identified. In this work, we empirically demonstrate that parameter importance exhibits temporal drift over the course of training. To address this, we propose Evolving Parameter Isolation (EPI), a fine-tuning framework that adapts isolation decisions based on online estimates of parameter importance. Instead of freezing a fixed subset of parameters, EPI periodically updates isolation masks using gradient-based signals, enabling the model to protect emerging task-critical parameters while releasing outdated ones to recover plasticity. Experiments on diverse multi-task benchmarks demonstrate that EPI consistently reduces interference and forgetting compared to static isolation and standard fine-tuning, while improving overall generalization. Our analysis highlights the necessity of synchronizing isolation mechanisms with the evolving dynamics of learning diverse abilities.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.