Towards Automated Pentesting with Large Language Models

April 13, 20262604.11772

Ricardo Bessa, Rui Claro, João Trindade, João Lourenço

cs.CR

TLDR

RedShell is a privacy-preserving, hardware-efficient framework using fine-tuned LLMs to automate PowerShell code generation for Windows pentesting.

Key contributions

Introduces RedShell, an LLM framework for automated PowerShell code generation for Windows pentesting.
Achieves over 90% syntactic validity and strong semantic alignment with reference pentesting snippets.
Outperforms SOTA in code similarity (over 50% average) and demonstrates execution reliability.

Why it matters

LLMs are increasingly used for malicious code generation. This paper demonstrates their ethical application in automating pentesting workflows, highlighting their potential benefits within controlled cybersecurity environments.

Original Abstract

Large Language Models (LLMs) are redefining offensive cybersecurity by allowing the generation of harmful machine code with minimal human intervention. While attackers take advantage of dark LLMs such as XXXGPT and WolfGPT to produce malicious code, ethical hackers can follow similar approaches to automate traditional pentesting workflows. In this work, we present RedShell, a privacy-preserving, hardware-efficient framework that leverages fine-tuned LLMs to assist pentesters in generating offensive PowerShell code targeting Microsoft Windows vulnerabilities. RedShell was trained on a malicious PowerShell dataset from the literature, which we further enhanced with manually curated code samples. Experiments show that our framework achieves over 90% syntactic validity in generated samples and strong semantic alignment with reference pentesting snippets, outperforming state-of-the-art counterparts in distance metrics such as edit distance (above 50% average code similarity). Additionally, functional experiments emphasize the execution reliability of the snippets produced by RedShell in a testing scenario that mirrors real-world settings. This work sheds light on the state-of-the-art research in the field of Generative AI applied to malicious code generation and automated testing, acknowledging the potential benefits that LLMs hold within controlled environments such as pentesting.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers