Enforcing Benign Trajectories: A Behavioral Firewall for Structured-Workflow AI Agents
TLDR
This paper introduces \codename, a behavioral firewall using pDFAs to secure LLM agents by enforcing benign tool-call trajectories with low latency.
Key contributions
- \codename is a behavioral firewall for LLM agents, using a pDFA compiled from benign tool-call telemetry.
- It enforces permitted tool sequences, sequential contexts, and parameter bounds with O(1) runtime lookup.
- Achieves 2.2% attack success rate (ASR) in structured workflows, outperforming state-of-the-art Aegis (12.8% ASR).
- Introduces only 2.2ms latency per call and maintains a low 2.0% benign task failure rate.
Why it matters
LLM agents pose security risks when interacting with sensitive environments. This paper offers a novel, effective solution by modeling and enforcing benign behavioral trajectories. \codename significantly reduces attack surface and improves security for structured-workflow agents, demonstrating strong performance and low overhead.
Original Abstract
Structured-workflow agents driven by large language models execute tool calls against sensitive external environments. We propose \codename, a telemetry-driven behavioral anomaly detection firewall. Drawing on sequence-based intrusion detection, \codename\ compiles verified benign tool-call telemetry into a parameterized deterministic finite automaton (pDFA). The model defines permitted tool sequences, sequential contexts, and parameter bounds. At runtime, a lightweight gateway enforces these boundaries via an $O(1)$ state-transition structural lookup, shifting computationally expensive analysis entirely offline. Evaluated on the Agent Security Bench (ASB), \codename\ achieves a 5.6\% macro-averaged attack success rate (ASR) across five scenarios. Within three structured workflows, ASR drops to 2.2\%, outperforming Aegis, a state-of-the-art stateless scanner, at 12.8\%. \codename\ achieves 0\% ASR on multi-step and context-sequential attacks in structured settings. Furthermore, against 1,000 algorithmically spliced exfiltration payloads, only 1.4\% matched valid structural paths, all of which failed end-to-end string parameter guards (0 successes out of 14 surviving paths, 95\% CI [0\%, 23.2\%]). \codename\ introduces just 2.2~ms of per-call latency (a 3.7$\times$ speedup over \textsc{Aegis}) while maintaining a 2.0\% benign task failure rate (BTFR) on benign workloads. Modeling the behavioral trajectory effectively collapses the available attack surface, but unmaintained continuous parameter bounds remain vulnerable to synonym-substitution attacks (18\% evasion rate). Thus, exact-match whitelisting of sensitive parameters ultimately bears the final defensive load against execution.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.