ArXiv TLDR

Delve into the Applicability of Advanced Optimizers for Multi-Task Learning

🐦 Tweet
2604.08939

Zhipeng Zhou, Linxiao Cao, Pengcheng Wu, Peilin Zhao, Chunyan Miao

cs.LG

TLDR

APT improves Multi-Task Learning by addressing advanced optimizer limitations, using adaptive momentum and direction preservation for better gradient utilization.

Key contributions

  • Identifies that advanced optimizers in MTL are hindered by instant gradients having a marginal role in updates.
  • Proposes APT, a framework with an adaptive momentum mechanism to balance advanced optimizers and MTL.
  • Introduces a light direction preservation method to facilitate Muon's orthogonalization for better MTL.
  • Demonstrates APT consistently improves existing MTL approaches across four mainstream datasets.

Why it matters

This paper addresses a critical limitation in Multi-Task Learning (MTL) where advanced optimizers fail to fully leverage instant gradients. By proposing APT, a novel framework, it enables MTL to better utilize these optimizers. This leads to significant performance gains, making MTL more effective and robust.

Original Abstract

Multi-Task Learning (MTL) is a foundational machine learning problem that has seen extensive development over the past decade. Recently, various optimization-based MTL approaches have been proposed to learn multiple tasks simultaneously by altering the optimization trajectory. Although these methods strive to de-conflict and re-balance tasks, we empirically identify that their effectiveness is often undermined by an overlooked factor when employing advanced optimizers: the instant-derived gradients play only a marginal role in the actual parameter updates. This discrepancy prevents MTL frameworks from fully releasing its power on learning dynamics. Furthermore, we observe that Muon-a recently emerged advanced optimizer-inherently functions as a multi-task learner, which underscores the critical importance of the gradients used for its orthogonalization. To address these issues, we propose APT (Applicability of advanced oPTimizers), a framework featuring a simple adaptive momentum mechanism designed to balance the strengths between advanced optimizers and MTL. Additionally, we introduce a light direction preservation method to facilitate Muon's orthogonalization. Extensive experiments across four mainstream MTL datasets demonstrate that APT consistently augments existing MTL approaches, yielding substantial performance improvements.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.