Beyond Augmented-Action Surrogates for Multi-Expert Learning-to-Defer

April 10, 20262604.09414

Yannis Montreuil, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

stat.MLcs.LG

TLDR

This paper introduces Learning-to-Defer with advice, proposing an augmented surrogate that consistently learns optimal routing and information acquisition.

Key contributions

Introduces "Learning-to-Defer with advice" (LtDA) for systems where experts can acquire dynamic info.
Shows common "separated surrogates" for LtDA are inconsistent, even in simple settings.
Proposes an "augmented surrogate" on a composite action space, proving its H-consistency.
Demonstrates improved performance over standard LtD across tabular, language, and multi-modal tasks.

Why it matters

This paper addresses a critical limitation of traditional Learning-to-Defer by allowing experts to dynamically acquire additional information. It provides a theoretically sound and empirically effective method for optimizing both expert routing and information acquisition in modern AI systems.

Original Abstract

Learning-to-Defer routes each input to the expert that minimizes expected cost, but it assumes that the information available to every expert is fixed at decision time. Many modern systems violate this assumption: after selecting an expert, one may also choose what additional information that expert should receive, such as retrieved documents, tool outputs, or escalation context. We study this problem and call it Learning-to-Defer with advice. We show that a broad family of natural separated surrogates, which learn routing and advice with distinct heads, is inconsistent even in the smallest non-trivial setting. We then introduce an augmented surrogate that operates on the composite expert--advice action space and prove an $\mathcal{H}$-consistency guarantee together with an excess-risk transfer bound, yielding recovery of the Bayes-optimal policy in the limit. Experiments on tabular, language, and multi-modal tasks show that the resulting method improves over standard Learning-to-Defer while adapting its advice-acquisition behavior to the cost regime; a synthetic benchmark confirms the failure mode predicted for separated surrogates.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers