Thinking Without Words: Efficient Latent Reasoning with Abstract Chain-of-Thought

April 24, 20262604.22709

Keshav Ramji, Tahira Naseem, Ramón Fernandez Astudillo

cs.CL

TLDR

Abstract Chain-of-Thought enables language models to reason efficiently using a short, learned abstract token sequence, reducing inference costs.

Key contributions

Proposes Abstract Chain-of-Thought, a discrete latent reasoning method using a short, reserved token vocabulary.
Achieves up to 11.6x fewer reasoning tokens with comparable performance across diverse tasks.
Trains abstract token generation via a warm-up loop (bottlenecking, self-distillation) and reinforcement learning.

Why it matters

This paper addresses the high inference cost of explicit Chain-of-Thought reasoning in LLMs. By introducing Abstract-CoT, it offers a method for efficient, latent reasoning that significantly reduces token generation without sacrificing performance. This approach paves the way for more practical and scalable deployment of complex reasoning capabilities in AI.

Original Abstract

While long, explicit chains-of-thought (CoT) have proven effective on complex reasoning tasks, they are costly to generate during inference. Non-verbal reasoning methods have emerged with shorter generation lengths by leveraging continuous representations, yet their performance lags behind verbalized CoT. We propose $\textbf{Abstract Chain-of-Thought}$, a discrete latent reasoning post-training mechanism in which the language model produces a short sequence of tokens from a reserved vocabulary in lieu of a natural language CoT, before generating a response. To make previously unseen ''abstract'' tokens useful, we introduce a policy iteration-style warm-up loop that alternates between (i.) bottlenecking from a verbal CoT via masking and performing supervised fine-tuning, and (ii.) self-distillation by training the model to generate abstract tokens from the prompt alone via constrained decoding with the codebook. After warm-up, we optimize the generation of abstract sequences with warm-started reinforcement learning under constrained decoding. Abstract-CoT achieves up to $11.6\times$ fewer reasoning tokens while demonstrating comparable performance across mathematical reasoning, instruction-following, and multi-hop reasoning, and generalizes across language model families. We also find an emergent power law distribution over the abstract vocabulary, akin to those seen in natural language, that evolves across the training phases. Our findings highlight the potential for post-training latent reasoning mechanisms that enable efficient inference through a learned abstract reasoning language.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers