RouteLMT: Learned Sample Routing for Hybrid LLM Translation Deployment

April 24, 20262604.22520

Yingfeng Luo, Hongyu Liu, Dingyang Lin, Kaiyan Chang, Chenglong Wang + 4 more

cs.CL

TLDR

RouteLMT is an in-model router that optimizes hybrid LLM translation deployment by predicting marginal gain to efficiently route samples, balancing cost and quality.

Key contributions

Formulates LLM translation routing as a budget allocation problem based on marginal gain.
Introduces RouteLMT, an in-model router predicting gain from small model's prompt tokens.
Routes samples without external models or hypothesis decoding, improving efficiency.
Achieves superior quality-budget trade-offs over heuristics and baselines.

Why it matters

Deploying LLMs for machine translation is costly. RouteLMT offers a novel, efficient routing strategy for hybrid systems, optimizing the balance between translation quality and computational budget. This makes high-quality LLM-based translation more practical and scalable for real-world applications.

Original Abstract

Large Language Models (LLMs) have achieved remarkable performance in Machine Translation (MT), but deploying them at scale remains prohibitively expensive. A widely adopted remedy is the hybrid system paradigm, which balances cost and quality by serving most requests with a small model and selectively routing a fraction to a large model. However, existing routing strategies often rely on heuristics, external predictors, or absolute quality estimation, which fail to capture whether the large model actually provides a worthwhile improvement over the small one. In this paper, we formulate routing as a budget allocation problem and identify marginal gain, i.e., the large model's improvement over the small model, as the optimal signal for budgeted decisions. Building on this, we propose \textbf{RouteLMT} (routing for LLM-based MT), an efficient in-model router that predicts this expected gain by probing the small translators prompt-token representation, without requiring external models or hypothesis decoding. Extensive experiments demonstrate that our RouteLMT outperforms heuristics, quality/difficulty estimation baselines, achieving a superior quality-budget Pareto frontier. Furthermore, we analyze regression risks and show that a simple guarded variant can mitigate severe quality losses.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers