ArXiv TLDR

IPI-proxy: An Intercepting Proxy for Red-Teaming Web-Browsing AI Agents Against Indirect Prompt Injection

🐦 Tweet
2605.11868

Chia-Pei, Chen, Kentaroh Toyoda, Anita Lai, Alex Leung

cs.CRcs.AI

TLDR

IPI-proxy is an intercepting proxy for red-teaming web-browsing AI agents against indirect prompt injection by rewriting whitelisted HTTP responses.

Key contributions

  • Intercepting proxy rewrites real HTTP responses from whitelisted domains to embed IPI payloads.
  • Unified library of 820 deduplicated attack strings from six published benchmarks.
  • YAML-driven harness parameterizes payloads, embedding techniques, and HTML insertion points.
  • Companion exfiltration tracker logs successful callbacks, enabling attack measurement.

Why it matters

Existing red-teaming tools fail to address indirect prompt injection in enterprise web-browsing AI agents using whitelisted domains. IPI-proxy provides a unique, reproducible substrate to measure and harden these agents against IPI on the same retrieval surface attackers exploit in production. This bridges static benchmarks and live deployment.

Original Abstract

Web-browsing AI agents are increasingly deployed in enterprise settings under strict whitelists of approved domains, yet adversaries can still influence them by embedding hidden instructions in the HTML pages those domains serve. Existing red-teaming resources fall short of this scenario: prompt-injection benchmarks ship pre-built adversarial pages that whitelisted agents cannot reach, and generic LLM scanners probe the model API rather than its retrieved content. We present IPI-proxy, an open-source toolkit for red-teaming web-browsing agents against indirect prompt injection (IPI). At its core is an intercepting proxy that rewrites real HTTP responses from whitelisted domains in flight, embedding payloads drawn from a unified library of 820 deduplicated attack strings extracted from six published benchmarks (BIPIA, InjecAgent, AgentDojo, Tensor Trust, WASP, and LLMail-Inject). A YAML-driven test harness independently parameterizes the payload set, the embedding technique (HTML comment, invisible CSS, or LLM-generated semantic prose), and the HTML insertion point (6 locations from \icode{head\_meta} to \icode{script\_comment}), enabling parameter-sweep evaluation without mock pages or sandboxed environments. A companion exfiltration tracker logs successful callbacks. This paper describes the threat model, situates IPI-proxy among contemporary IPI benchmarks and red-teaming tools, and details its architecture, design decisions, and configuration interface. By bridging static benchmarks and live deployment, IPI-proxy gives AI security teams a reproducible substrate for measuring and hardening web-browsing agents against indirect prompt injection on the same retrieval surface attackers exploit in production.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.