Thinking Without Words: Efficient Latent Reasoning with Abstract Chain-of-Thought
Keshav Ramji, Tahira Naseem, Ramón Fernandez Astudillo
TLDR
Abstract Chain-of-Thought enables language models to reason efficiently using a short, learned abstract token sequence, reducing inference costs.
Key contributions
- Proposes Abstract Chain-of-Thought, a discrete latent reasoning method using a short, reserved token vocabulary.
- Achieves up to 11.6x fewer reasoning tokens with comparable performance across diverse tasks.
- Trains abstract token generation via a warm-up loop (bottlenecking, self-distillation) and reinforcement learning.
Why it matters
This paper addresses the high inference cost of explicit Chain-of-Thought reasoning in LLMs. By introducing Abstract-CoT, it offers a method for efficient, latent reasoning that significantly reduces token generation without sacrificing performance. This approach paves the way for more practical and scalable deployment of complex reasoning capabilities in AI.
Original Abstract
While long, explicit chains-of-thought (CoT) have proven effective on complex reasoning tasks, they are costly to generate during inference. Non-verbal reasoning methods have emerged with shorter generation lengths by leveraging continuous representations, yet their performance lags behind verbalized CoT. We propose $\textbf{Abstract Chain-of-Thought}$, a discrete latent reasoning post-training mechanism in which the language model produces a short sequence of tokens from a reserved vocabulary in lieu of a natural language CoT, before generating a response. To make previously unseen ''abstract'' tokens useful, we introduce a policy iteration-style warm-up loop that alternates between (i.) bottlenecking from a verbal CoT via masking and performing supervised fine-tuning, and (ii.) self-distillation by training the model to generate abstract tokens from the prompt alone via constrained decoding with the codebook. After warm-up, we optimize the generation of abstract sequences with warm-started reinforcement learning under constrained decoding. Abstract-CoT achieves up to $11.6\times$ fewer reasoning tokens while demonstrating comparable performance across mathematical reasoning, instruction-following, and multi-hop reasoning, and generalizes across language model families. We also find an emergent power law distribution over the abstract vocabulary, akin to those seen in natural language, that evolves across the training phases. Our findings highlight the potential for post-training latent reasoning mechanisms that enable efficient inference through a learned abstract reasoning language.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.