Compress Then Adapt? No, Do It Together via Task-aware Union of Subspaces

May 4, 20262605.02829

Jingze Ge, Yun Liu, Xue Geng, Wanqi Dong, Wang Zhe Mark + 2 more

cs.AI

TLDR

JACTUS unifies model compression and task adaptation, outperforming sequential methods by jointly optimizing for both, leading to efficient and robust tuning.

Key contributions

Introduces JACTUS, a framework unifying model compression and parameter-efficient adaptation.
Mitigates misalignment between compressed subspace and downstream tasks by joint optimization.
Outperforms 100% PEFT baselines on ViT-Base vision tasks with only 80% retained parameters.
Exceeds 100% PEFT baselines on Llama2-7B language tasks with only 80% retained parameters.

Why it matters

Current methods decouple compression and adaptation, leading to suboptimal performance and wasted resources. JACTUS offers a more efficient and effective approach by integrating these processes, enabling the deployment of smaller, high-performing models. This is crucial for scaling large models in resource-constrained environments.

Original Abstract

Adapting large pretrained models to diverse tasks is now routine, yet the two dominant strategies of parameter-efficient fine-tuning (PEFT) and low-rank compression are typically composed in sequence. This decoupled practice first compresses and then fine-tunes adapters, potentially misaligning the compressed subspace with downstream objectives and squandering a global parameter budget. To overcome this limitation, we introduce JACTUS (Joint Adaptation and Compression with a Task-aware Union of Subspaces), a single framework that unifies compression and adaptation. From a small calibration set, JACTUS estimates input and pre-activation gradient covariances, forms their orthogonal union with the pretrained weight subspace, performs a projected low-rank approximation inside this union, allocates rank globally by marginal gain per parameter, and trains only a compact core matrix. This explicitly mitigates the potential misalignment between the compressed subspace and downstream objectives by coupling the directions preserved for compression with those required for adaptation, yielding a deployable low-rank model that avoids retaining full frozen weights while enabling fast and robust tuning. On vision, JACTUS attains an average 89.2% accuracy on ViT-Base across eight datasets at 80% retained parameters, surpassing strong 100% PEFT baselines (e.g., DoRA 87.9%). On language, JACTUS achieves an 80.9% average on Llama2-7B commonsense QA at the same 80% retained-parameter budget, outperforming 100% PEFT (e.g., DoRA 79.7%) and exceeding prior compress-then-finetune pipelines under the same ratained-parameter budget. We will release code.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers