Dynamic Cyber Ranges
Víctor Mayoral-Vilches, María Sanz-Gómez, Francesco Balassone, Maite Del Mundo De Torres, George Nicolaou + 4 more
TLDR
Dynamic Cyber Ranges introduce LLM-driven defenders to harden infrastructure and evaluate advanced AI agents, preserving evaluation headroom.
Key contributions
- LLM-driven APT agents easily compromise current static cyber ranges, validating their diminishing resistance.
- Introduces Dynamic Cyber Ranges with LLM-driven Defender agents for real-time hardening, monitoring, and response.
- Defender agents reduce attacker success to 0-55%, achieving complete prevention in multiple configurations.
- Smaller, on-premise LLMs matched frontier models in defense and detected attacks 10x faster.
Why it matters
Current cyber ranges are becoming obsolete for evaluating advanced LLM agents. This paper introduces a novel approach using dynamic, LLM-driven defenders to create more robust and realistic evaluation environments. It also highlights the surprising effectiveness of smaller, on-premise models for defense.
Original Abstract
As LLM-driven agents advance in cybersecurity, Jeopardy CTF benchmarks are approaching saturation and cyber ranges, the natural next evaluation frontier, offer diminishing resistance under their current static design. We validate this observation by deploying an LLM-driven Advanced Persistent Threat (APT) agent across three tiers of increasingly realistic infrastructure (PRO Labs, MHBench, military-grade CYBER RANGES). To counteract this trend, we propose Dynamic Cyber Ranges: cyber range environments augmented with LLM-driven Defender agents that harden infrastructure, monitor for intrusions, and respond in real time. Across evaluated scenarios, Defender agents reduce attacker success to 0-55%, achieving complete prevention on multiple configurations. Since attacker and defender agents draw from the same underlying model capabilities, Dynamic Cyber Ranges preserve evaluation headroom as models improve. Notably, a smaller, specialized on-premise model (alias2-mini) matched the frontier model's defensive outcomes on multiple scenarios under identical, untuned prompts, and detected the attacker 10x faster on a complex enterprise scenario, suggesting that privacy-preserving on-premise models can serve as competent defenders against frontier-class attackers. The experiments further surface emergent agent behaviors, including scope expansion and prompt exfiltration, with implications for AI benchmark integrity and agentic system design.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.