ArXiv TLDR

ClawGuard: A Runtime Security Framework for Tool-Augmented LLM Agents Against Indirect Prompt Injection

🐦 Tweet
2604.11790

Wei Zhao, Zhe Li, Peixin Zhang, Jun Sun

cs.CRcs.AI

TLDR

ClawGuard is a runtime security framework protecting tool-augmented LLM agents from indirect prompt injection by enforcing rules at tool-call boundaries.

Key contributions

  • Secures tool-augmented LLM agents against indirect prompt injection attacks.
  • Enforces user-confirmed rules at every tool-call boundary for deterministic defense.
  • Blocks web, local, MCP server, and skill file injection pathways.
  • Achieves robust protection without requiring model modification or infrastructure changes.

Why it matters

Tool-augmented LLM agents are highly vulnerable to indirect prompt injection, where malicious instructions are embedded in trusted tool content. ClawGuard provides a crucial, deterministic defense mechanism by intercepting adversarial tool calls before real-world effects, ensuring secure agentic AI systems.

Original Abstract

Tool-augmented Large Language Model (LLM) agents have demonstrated impressive capabilities in automating complex, multi-step real-world tasks, yet remain vulnerable to indirect prompt injection. Adversaries exploit this weakness by embedding malicious instructions within tool-returned content, which agents directly incorporate into their conversation history as trusted observations. This vulnerability manifests across three primary attack channels: web and local content injection, MCP server injection, and skill file injection. To address these vulnerabilities, we introduce \textsc{ClawGuard}, a novel runtime security framework that enforces a user-confirmed rule set at every tool-call boundary, transforming unreliable alignment-dependent defense into a deterministic, auditable mechanism that intercepts adversarial tool calls before any real-world effect is produced. By automatically deriving task-specific access constraints from the user's stated objective prior to any external tool invocation, \textsc{ClawGuard} blocks all three injection pathways without model modification or infrastructure change. Experiments across five state-of-the-art language models on AgentDojo, SkillInject, and MCPSafeBench demonstrate that \textsc{ClawGuard} achieves robust protection against indirect prompt injection without compromising agent utility. This work establishes deterministic tool-call boundary enforcement as an effective defense mechanism for secure agentic AI systems, requiring neither safety-specific fine-tuning nor architectural modification. Code is publicly available at https://github.com/Claw-Guard/ClawGuard.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.