Fabricator or dynamic translator?

April 16, 20262604.15165

cs.CL

TLDR

This paper explores the nature and detection of LLM overgenerations in machine translation, from self-explanations to confabulations, in a commercial setting.

Key contributions

LLMs exhibit diverse overgenerations in translation, including self-explanations and risky confabulations.
These overgenerations are distinct from the 'neurobabble' seen in traditional NMT systems.
The paper details strategies for detecting and classifying the exact nature of these overgenerations.
Presents results from a commercial setting, offering practical insights into LLM translation.

Why it matters

This research is crucial for understanding and mitigating the unique challenges of LLM overgeneration in machine translation. It offers practical strategies for improving the reliability and utility of LLM-based translation systems in commercial applications.

Original Abstract

LLMs are proving to be adept at machine translation although due to their generative nature they may at times overgenerate in various ways. These overgenerations are different from the neurobabble seen in NMT and range from LLM self-explanations, to risky confabulations, to appropriate explanations, where the LLM is able to act as a human translator would, enabling greater comprehension for the target audience. Detecting and determining the exact nature of the overgenerations is a challenging task. We detail different strategies we have explored for our work in a commercial setting, and present our results.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers