Pen-Strategist: A Reasoning Framework for Penetration Testing Strategy Formation and Analysis

May 6, 20262605.04499

Yasod Ginige, Pasindu Marasinghe, Sajal Jain, Suranga Seneviratne

cs.CRcs.AI

TLDR

Pen-Strategist is a new framework that uses logical reasoning and a classifier to improve automated penetration testing strategy formation and action selection.

Key contributions

Proposes Pen-Strategist, a framework with a domain-specific reasoning model for pentesting strategies.
Fine-tuned Qwen-3-14B model achieves 87% better strategy derivation performance.
Integrates with PentestGPT, improving subtask completion by 47.5% over GPT-5.
A CNN classifier for step prediction outperforms commercial LLMs by 28%.

Why it matters

The shortage of cybersecurity professionals demands better automated pentesting. Pen-Strategist significantly improves LLM-based agents' strategy formation and action selection, making automated security systems more effective and stable.

Original Abstract

Cyber threats are rapidly increasing, expanding their impact from large-scale enterprises to government services and individual users, making robust security systems increasingly essential. However, a significant shortage of skilled cybersecurity professionals exacerbates this challenge. While recent research has explored automating tasks such as penetration testing using LLM-based agents, existing frameworks often perform poorly due to limited capability in strategy formulation, domain-specific reasoning, and accurate action and tool selection. To overcome these limitations, we propose Pen-Strategist framework, consisting of a novel domain-specific reasoning model that derives pentesting strategies via logical reasoning and a classifier that converts the strategies into actionable steps. First, we construct a reasoning dataset containing logical explanations for both strategy derivation and step selection in pentesting scenarios. We then fine-tune a Qwen-3-14B model for strategy generation using reinforcement learning. Evaluation on the test split of the dataset demonstrates a 87% improvement in strategy derivation performance compared to the baseline. Furthermore, we integrate the fine-tuned Pen-Strategist model into existing automated pentesting frameworks, such as PentestGPT, and evaluate its performance on vulnerable machines, achieving a 47.5% improvement in subtask completion while surpassing the baseline GPT-5. Further experiments on the CTFKnow benchmark show an 18% performance gain over the base model. For step prediction, we train a semantic-based CNN classifier, which outperforms commercial LLMs by 28% and enhances execution stability. Finally, we conduct a user study to qualitatively assess the generated strategies, and Pen-Strategist demonstrates superior performance compared to the Claude-4.6-Sonnet.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers