JumpLoRA: Sparse Adapters for Continual Learning in Large Language Models
Alexandra Dragomir, Ioana Pintilie, Antonio Barbalau, Marius Dragoi, Florin Brad + 4 more
TLDR
JumpLoRA introduces sparse adapters using JumpReLU gating for continual learning in LLMs, effectively preventing catastrophic forgetting and outperforming SOTA.
Key contributions
- Proposes JumpLoRA, a novel framework for sparse LoRA adapters in LLMs using JumpReLU gating.
- Achieves dynamic parameter isolation to effectively prevent catastrophic forgetting and task interference.
- Highly modular, compatible with existing LoRA-based continual learning methods like IncLoRA.
- Significantly boosts IncLoRA performance and outperforms state-of-the-art continual learning method ELLA.
Why it matters
Continual learning in LLMs is crucial but suffers from catastrophic forgetting. JumpLoRA offers an effective solution by dynamically isolating parameters. This method improves existing approaches and sets a new benchmark for performance.
Original Abstract
Adapter-based methods have become a cost-effective approach to continual learning (CL) for Large Language Models (LLMs), by sequentially learning a low-rank update matrix for each task. To mitigate catastrophic forgetting, state-of-the-art approaches impose constraints on new adapters with respect to the previous ones, by targeting either subspace or coordinate-wise interference. In this paper, we propose JumpLoRA, a novel framework to adaptively induce sparsity in the Low-Rank Adaptation (LoRA) blocks through the use of JumpReLU gating. The method achieves dynamic parameter isolation, which helps prevent task interference. We demonstrate that our method is highly modular and compatible with LoRA-based CL approaches. Specifically, it significantly boosts the performance of IncLoRA and outperforms the leading state-of-the-art CL method, ELLA.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.