Pen-Strategist: A Reasoning Framework for Penetration Testing Strategy Formation and Analysis
Yasod Ginige, Pasindu Marasinghe, Sajal Jain, Suranga Seneviratne
TLDR
Pen-Strategist is a new framework that uses logical reasoning and a classifier to improve automated penetration testing strategy formation and action selection.
Key contributions
- Proposes Pen-Strategist, a framework with a domain-specific reasoning model for pentesting strategies.
- Fine-tuned Qwen-3-14B model achieves 87% better strategy derivation performance.
- Integrates with PentestGPT, improving subtask completion by 47.5% over GPT-5.
- A CNN classifier for step prediction outperforms commercial LLMs by 28%.
Why it matters
The shortage of cybersecurity professionals demands better automated pentesting. Pen-Strategist significantly improves LLM-based agents' strategy formation and action selection, making automated security systems more effective and stable.
Original Abstract
Cyber threats are rapidly increasing, expanding their impact from large-scale enterprises to government services and individual users, making robust security systems increasingly essential. However, a significant shortage of skilled cybersecurity professionals exacerbates this challenge. While recent research has explored automating tasks such as penetration testing using LLM-based agents, existing frameworks often perform poorly due to limited capability in strategy formulation, domain-specific reasoning, and accurate action and tool selection. To overcome these limitations, we propose Pen-Strategist framework, consisting of a novel domain-specific reasoning model that derives pentesting strategies via logical reasoning and a classifier that converts the strategies into actionable steps. First, we construct a reasoning dataset containing logical explanations for both strategy derivation and step selection in pentesting scenarios. We then fine-tune a Qwen-3-14B model for strategy generation using reinforcement learning. Evaluation on the test split of the dataset demonstrates a 87% improvement in strategy derivation performance compared to the baseline. Furthermore, we integrate the fine-tuned Pen-Strategist model into existing automated pentesting frameworks, such as PentestGPT, and evaluate its performance on vulnerable machines, achieving a 47.5% improvement in subtask completion while surpassing the baseline GPT-5. Further experiments on the CTFKnow benchmark show an 18% performance gain over the base model. For step prediction, we train a semantic-based CNN classifier, which outperforms commercial LLMs by 28% and enhances execution stability. Finally, we conduct a user study to qualitatively assess the generated strategies, and Pen-Strategist demonstrates superior performance compared to the Claude-4.6-Sonnet.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.