Discrete Tilt Matching
Yuyuan Chen, Shiyi Wang, Peter Potaptchik, Jaeyeon Kim, Michael S. Albergo
TLDR
Discrete Tilt Matching (DTM) is a novel likelihood-free method for fine-tuning masked diffusion LLMs, improving performance on various tasks.
Key contributions
- Introduces Discrete Tilt Matching (DTM), a likelihood-free method for masked diffusion LLM fine-tuning.
- Recasts fine-tuning as state-level matching of local unmasking posteriors under reward tilting.
- Employs a weighted cross-entropy objective with control variates for improved training stability.
- Achieves strong gains on Sudoku and Countdown, competitive on MATH500 and GSM8K.
Why it matters
DTM addresses the intractability of RL objectives for masked diffusion LLMs, offering a stable and effective likelihood-free fine-tuning method. This advancement unlocks the full potential of dLLMs as a powerful alternative to autoregressive models for complex generation tasks.
Original Abstract
Masked diffusion large language models (dLLMs) are a promising alternative to autoregressive generation. While reinforcement learning (RL) methods have recently been adapted to dLLM fine-tuning, their objectives typically depend on sequence-level marginal likelihoods, which are intractable for masked diffusion models. To address this, we derive Discrete Tilt Matching (DTM), a likelihood-free method that recasts dLLM fine-tuning as state-level matching of local unmasking posteriors under reward tilting. DTM takes the form of a weighted cross-entropy objective with explicit minimizer, and admits control variates that improve training stability. On a synthetic maze-planning task, we analyze how DTM's annealing schedule and control variates affect training stability and prevent mode collapse. At scale, fine-tuning LLaDA-8B-Instruct with DTM yields strong gains on Sudoku and Countdown while remaining competitive on MATH500 and GSM8K.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.