Discrete Tilt Matching

April 20, 20262604.18739

Yuyuan Chen, Shiyi Wang, Peter Potaptchik, Jaeyeon Kim, Michael S. Albergo

cs.LGstat.ML

TLDR

Discrete Tilt Matching (DTM) is a novel likelihood-free method for fine-tuning masked diffusion LLMs, improving performance on various tasks.

Key contributions

Introduces Discrete Tilt Matching (DTM), a likelihood-free method for masked diffusion LLM fine-tuning.
Recasts fine-tuning as state-level matching of local unmasking posteriors under reward tilting.
Employs a weighted cross-entropy objective with control variates for improved training stability.
Achieves strong gains on Sudoku and Countdown, competitive on MATH500 and GSM8K.

Why it matters

DTM addresses the intractability of RL objectives for masked diffusion LLMs, offering a stable and effective likelihood-free fine-tuning method. This advancement unlocks the full potential of dLLMs as a powerful alternative to autoregressive models for complex generation tasks.

Original Abstract

Masked diffusion large language models (dLLMs) are a promising alternative to autoregressive generation. While reinforcement learning (RL) methods have recently been adapted to dLLM fine-tuning, their objectives typically depend on sequence-level marginal likelihoods, which are intractable for masked diffusion models. To address this, we derive Discrete Tilt Matching (DTM), a likelihood-free method that recasts dLLM fine-tuning as state-level matching of local unmasking posteriors under reward tilting. DTM takes the form of a weighted cross-entropy objective with explicit minimizer, and admits control variates that improve training stability. On a synthetic maze-planning task, we analyze how DTM's annealing schedule and control variates affect training stability and prevent mode collapse. At scale, fine-tuning LLaDA-8B-Instruct with DTM yields strong gains on Sudoku and Countdown while remaining competitive on MATH500 and GSM8K.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers