ArXiv TLDR

Diffusion Language Models for Speech Recognition

🐦 Tweet
2604.14001

Davyd Naveriani, Albert Zeyer, Ralf Schlüter, Hermann Ney

cs.CLcs.AIcs.LGcs.NE

TLDR

Diffusion language models (MDLM, USDM) are applied to speech recognition, improving accuracy via rescoring and a novel joint-decoding method.

Key contributions

  • Introduces a comprehensive guide for incorporating masked diffusion (MDLM) and uniform-state diffusion (USDM) models for ASR rescoring.
  • Designs a new joint-decoding method combining CTC and USDM, integrating their probability distributions.
  • Demonstrates significant improvements in recognized text accuracy using both USDM and MDLM.

Why it matters

This paper introduces diffusion language models as a powerful alternative to traditional LMs in speech recognition. By improving rescoring and proposing a new joint-decoding method, it significantly boosts ASR accuracy. This work provides valuable insights and tools for advancing robust speech systems.

Original Abstract

Diffusion language models have recently emerged as a leading alternative to standard language models, due to their ability for bidirectional attention and parallel text generation. In this work, we explore variants for their use in speech recognition. Specifically, we introduce a comprehensive guide to incorporating masked diffusion language models (MDLM) and uniform-state diffusion models (USDMs) for rescoring ASR hypotheses. Additionally, we design a new joint-decoding method that combines CTC and USDM by integrating the framewise probability distributions derived from CTC with the labelwise probability distributions computed by USDM at each decoding step, thereby generating new candidates that combine strong language knowledge from USDM and acoustic information from CTC. Our findings reveal that USDM, as well as MDLM, can significantly improve the accuracy of recognized text. We publish all our code and recipes.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.