ArXiv TLDR

Stateful Agent Backdoor

🐦 Tweet
2605.06158

Zhengchunmin Dai, Jiaxiong Tang, Liantao Wu, Peng Sun, Honglong Chen

cs.CR

TLDR

This paper introduces a stateful backdoor attack for LLM-based agents that persists across multiple sessions, enabling incremental, autonomous execution.

Key contributions

  • Proposes a stateful agent backdoor that extends attack lifecycle across multiple LLM agent sessions.
  • Maintains attack state through persistent components, enabling autonomous, incremental execution after one trigger.
  • Models the attack as a Mealy machine, achieving 80-95% success across four different LLM models.

Why it matters

Existing LLM agent backdoors are stateless and limited to single sessions. This research proposes a novel stateful attack that can persist and execute incrementally over time, posing a significant new threat to agent security. It highlights a critical vulnerability in current agent designs.

Original Abstract

Existing backdoor attacks on Large Language Model-based agents remain stateless, executing fixed behaviors confined to a single session. We propose a stateful agent backdoor that extends the attack lifecycle across multiple sessions under permission isolation. The attack maintains state through persistent components, enabling autonomous, incremental execution across sessions following a one-time trigger injection. Formally, we model the attack as a Mealy machine and derive a decomposition framework that enables independent per-transition data construction. We instantiate this framework with a primary attack and two extensibility variants. The primary instantiation achieves an attack success rate of 80\%--95\% across four models, with per-transition analysis demonstrating the effectiveness of the decomposition. Extensibility variants with alternative topologies and persistent components demonstrate consistent effectiveness. Code and data are available at https://anonymous.4open.science/r/stateful_agent_backdoor-E89F.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.