Poison Once, Exploit Forever: Environment-Injected Memory Poisoning Attacks on Web Agents
Wei Zou, Mingwen Dong, Miguel Romero Calvo, Wei Zou, Shuaichen Chang + 6 more
TLDR
Environment-injected memory poisoning (eTAMP) allows attackers to compromise LLM-based web agents across sessions and sites via a single observation.
Key contributions
- eTAMP is the first attack to achieve cross-session, cross-site compromise of LLM agents without direct memory access.
- A single manipulated observation (e.g., product page) silently poisons memory, activating later on different websites.
- Achieves substantial attack success rates: up to 32.5% on GPT-5-mini and 19.5% on GPT-OSS-120B.
- Discovers "Frustration Exploitation," where agents under environmental stress become up to 8x more susceptible.
Why it matters
This paper reveals a critical vulnerability in LLM-based web agents, showing how environmental observations can lead to persistent, cross-site compromises. It highlights the urgent need for new defenses, especially with the rise of AI browsers, as even capable models are not immune.
Original Abstract
Memory makes LLM-based web agents personalized, powerful, yet exploitable. By storing past interactions to personalize future tasks, agents inadvertently create a persistent attack surface that spans websites and sessions. While existing security research on memory assumes attackers can directly inject into memory storage or exploit shared memory across users, we present a more realistic threat model: contamination through environmental observation alone. We introduce Environment-injected Trajectory-based Agent Memory Poisoning (eTAMP), the first attack to achieve cross-session, cross-site compromise without requiring direct memory access. A single contaminated observation (e.g., viewing a manipulated product page) silently poisons an agent's memory and activates during future tasks on different websites, bypassing permission-based defenses. Our experiments on (Visual)WebArena reveal two key findings. First, eTAMP achieves substantial attack success rates: up to 32.5% on GPT-5-mini, 23.4% on GPT-5.2, and 19.5% on GPT-OSS-120B. Second, we discover Frustration Exploitation: agents under environmental stress become dramatically more susceptible, with ASR increasing up to 8 times when agents struggle with dropped clicks or garbled text. Notably, more capable models are not more secure. GPT-5.2 shows substantial vulnerability despite superior task performance. With the rise of AI browsers like OpenClaw, ChatGPT Atlas, and Perplexity Comet, our findings underscore the urgent need for defenses against environment-injected memory poisoning.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.