Task Alignment: A simple and effective proxy for model merging in computer vision

April 14, 20262604.12935

Pau de Jorge, César Roberto de Souza, Björn Michele, Mert Bülent Sarıyıldız, Philippe Weinzaepfel + 3 more

cs.CV

TLDR

This paper introduces "task alignment," a novel proxy for efficient model merging in computer vision, significantly speeding up hyperparameter selection for multi-task models.

Key contributions

Introduces the "task alignment proxy" for efficient model merging in computer vision.
Addresses the costly hyperparameter selection problem for models with heterogeneous decoders.
Speeds up hyperparameter selection by orders of magnitude while maintaining performance.
Extends the applicability of model merging to multi-task vision models beyond CLIP-based classification.

Why it matters

Efficiently merging fine-tuned models is crucial for practical computer vision. Existing methods are often limited to CLIP and struggle with costly hyperparameter selection for heterogeneous decoders. This paper makes model merging more practical and applicable to diverse multi-task vision scenarios, saving significant computational resources.

Original Abstract

Efficiently merging several models fine-tuned for different tasks, but stemming from the same pretrained base model, is of great practical interest. Despite extensive prior work, most evaluations of model merging in computer vision are restricted to image classification using CLIP, where different classification datasets define different tasks. In this work, our goal is to make model merging more practical and show its relevance on challenging scenarios beyond this specific setting. In most vision scenarios, different tasks rely on trainable and usually heterogeneous decoders. Differently from previous studies with frozen decoders, where merged models can be evaluated right away, the non-trivial cost of decoder training renders hyperparameter selection based on downstream performance impractical. To address this, we introduce the task alignment proxy, and show how it can be used to speed up hyperparameter selection by orders of magnitude while retaining performance. Equipped with the task alignment proxy, we extend the applicability of model merging to multi-task vision models beyond CLIP-based classification.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers