Inducing Overthink: Hierarchical Genetic Algorithm-based DoS Attack on Black-Box Large Language Reasoning Models
Shuqiang Wang, Wei Cao, Jiaqi Weng, Jialing Tao, Licheng Pan + 2 more
TLDR
A hierarchical genetic algorithm can induce "overthink" in black-box LLMs, creating DoS attacks by significantly increasing response length and resource consumption.
Key contributions
- Develops a black-box hierarchical genetic algorithm (HGA) to induce "overthink" in large reasoning models (LRMs).
- Achieves up to 26.1x output length increase on the MATH benchmark, significantly surpassing other baselines.
- Shows adversarial inputs transfer effectively from small proxy models to large commercial LRMs.
- Highlights "overthinking" as a shared and exploitable DoS vulnerability in modern reasoning systems.
Why it matters
This research reveals a critical DoS vulnerability in large reasoning models, where incomplete inputs can cause "overthinking" and resource exhaustion. It underscores the urgent need for robust defenses against such attacks to ensure the reliability and availability of AI systems.
Original Abstract
Large Reasoning Models (LRMs) are increasingly integrated into systems requiring reliable multi-step inference, yet this growing dependence exposes new vulnerabilities related to computational availability. In particular, LRMs exhibit a tendency to "overthink", producing excessively long and redundant reasoning traces, when confronted with incomplete or logically inconsistent inputs. This behavior significantly increases inference latency and energy consumption, forming a potential vector for denial-of-service (DoS) style resource exhaustion. In this work, we investigate this attack surface and propose an automated black-box framework that induces overthinking in LRMs by systematically perturbing the logical structure of input problems. Our method employs a hierarchical genetic algorithm (HGA) operating on structured problem decompositions, and optimizes a composite fitness function designed to maximize both response length and reflective overthinking markers. Across four state-of-the-art reasoning models, the proposed method substantially amplifies output length, achieving up to a 26.1x increase on the MATH benchmark and consistently outperforming benign and manually crafted missing-premise baselines. We further demonstrate strong transferability, showing that adversarial inputs evolved using a small proxy model retain high effectiveness against large commercial LRMs. These findings highlight overthinking as a shared and exploitable vulnerability in modern reasoning systems, underscoring the need for more robust defenses.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.