Auto-FlexSwitch: Efficient Dynamic Model Merging via Learnable Task Vector Compression

April 30, 20262604.28109

Junqi Gao, Dazhi Zhang, Zhichang Guo, Biqing Qi, Yi Ran + 1 more

cs.LG

TLDR

Auto-FlexSwitch efficiently merges models dynamically by compressing task vectors using learnable sparsification and quantization for multi-task adaptation.

Key contributions

Proposes T-Switch, decomposing task vectors into compact components for high compression ratios.
Introduces FlexSwitch, a learnable framework for adaptive sparsification, quantization, and storage.
Develops Auto-FlexSwitch, combining FlexSwitch with KNN inference for efficient dynamic model merging.
Addresses prohibitive storage overhead in dynamic merging by compressing task-specific parameters.

Why it matters

Dynamic model merging offers high performance for multi-task adaptation but suffers from prohibitive storage overhead. This paper provides a novel solution by introducing learnable compression for task vectors, significantly reducing storage requirements. This makes dynamic merging more practical and scalable for real-world applications.

Original Abstract

Model merging has attracted attention as an effective path toward multi-task adaptation by integrating knowledge from multiple task-specific models. Among existing approaches, dynamic merging mitigates performance degradation caused by conflicting parameter updates across tasks by flexibly combining task-specific parameters at inference time, thereby maintaining high performance. However, these methods require storing independent parameters for each task, resulting in prohibitive storage overhead. To address this issue, we first experimentally demonstrate that the fine-tuned weight increments (referred to as task vectors) exhibit an impulse-like activation pattern and high robustness to low-bit representations. Driven by this insight, we propose T-Switch, which decomposes task vectors into three compact components: a binary sparse mask, a sign vector, and a scalar scaling factor, achieving high-fidelity approximation at high compression ratios. We then introduce Auto-Switch, a training-free merging scheme that automatically composes task vectors via feature similarity retrieval. Building on this, we develop Auto-Switch, a training-free merging scheme that automatically assembles task vectors through feature similarity retrieval. Furthermore, to transform task vector sparsification and quantization from static rules to adaptive learning, we propose FlexSwitch, a learnable framework which jointly optimizes the compression strategy for each model unit via Learnable Gating Sparsification (LGS) and Bit-width Adaptive Selection (BAS), while employing the Sparsity-Aware Storage Strategy (SASS) to select the optimal storage encoding structure. Finally, by incorporating a K-Nearest Neighbor (KNN) inference scheme with a learnable low-rank metric, we present Auto-FlexSwitch, a dynamic model merging approach that supports highly efficient task vector compression.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers