Dynamic Cyber Ranges

April 27, 20262604.24184

Víctor Mayoral-Vilches, María Sanz-Gómez, Francesco Balassone, Maite Del Mundo De Torres, George Nicolaou + 4 more

cs.CR

TLDR

Dynamic Cyber Ranges introduce LLM-driven defenders to harden infrastructure and evaluate advanced AI agents, preserving evaluation headroom.

Key contributions

LLM-driven APT agents easily compromise current static cyber ranges, validating their diminishing resistance.
Introduces Dynamic Cyber Ranges with LLM-driven Defender agents for real-time hardening, monitoring, and response.
Defender agents reduce attacker success to 0-55%, achieving complete prevention in multiple configurations.
Smaller, on-premise LLMs matched frontier models in defense and detected attacks 10x faster.

Why it matters

Current cyber ranges are becoming obsolete for evaluating advanced LLM agents. This paper introduces a novel approach using dynamic, LLM-driven defenders to create more robust and realistic evaluation environments. It also highlights the surprising effectiveness of smaller, on-premise models for defense.

Original Abstract

As LLM-driven agents advance in cybersecurity, Jeopardy CTF benchmarks are approaching saturation and cyber ranges, the natural next evaluation frontier, offer diminishing resistance under their current static design. We validate this observation by deploying an LLM-driven Advanced Persistent Threat (APT) agent across three tiers of increasingly realistic infrastructure (PRO Labs, MHBench, military-grade CYBER RANGES). To counteract this trend, we propose Dynamic Cyber Ranges: cyber range environments augmented with LLM-driven Defender agents that harden infrastructure, monitor for intrusions, and respond in real time. Across evaluated scenarios, Defender agents reduce attacker success to 0-55%, achieving complete prevention on multiple configurations. Since attacker and defender agents draw from the same underlying model capabilities, Dynamic Cyber Ranges preserve evaluation headroom as models improve. Notably, a smaller, specialized on-premise model (alias2-mini) matched the frontier model's defensive outcomes on multiple scenarios under identical, untuned prompts, and detected the attacker 10x faster on a complex enterprise scenario, suggesting that privacy-preserving on-premise models can serve as competent defenders against frontier-class attackers. The experiments further surface emergent agent behaviors, including scope expansion and prompt exfiltration, with implications for AI benchmark integrity and agentic system design.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers