Supply-Chain Poisoning Attacks Against LLM Coding Agent Skill Ecosystems
Yubin Qu, Yi Liu, Tongcheng Geng, Gelei Deng, Yuekang Li + 3 more
TLDR
A new supply-chain attack, DDIPE, poisons LLM coding agent skills by hiding malicious logic in documentation examples, bypassing strong defenses.
Key contributions
- Introduces Document-Driven Implicit Payload Execution (DDIPE) for LLM coding agent supply-chain attacks.
- DDIPE embeds malicious logic in skill documentation examples, executed when agents reuse them.
- Achieves 11.6-33.5% bypass rates against strong defenses, while explicit attacks fail.
- Identified 2.5% of attacks that evade both static analysis and LLM alignment safeguards.
Why it matters
This work reveals a critical vulnerability in LLM coding agent skill ecosystems, where documentation itself can be weaponized. It demonstrates that current defenses are insufficient against implicit payload execution. The findings highlight an urgent need for enhanced security measures in agent marketplaces and skill development.
Original Abstract
LLM-based coding agents extend their capabilities via third-party agent skills distributed through open marketplaces without mandatory security review. Unlike traditional packages, these skills are executed as operational directives with system-level privileges, so a single malicious skill can compromise the host. Prior work has not examined whether supply-chain attacks can directly hijack an agent's action space, such as file writes, shell commands, and network requests, despite existing safeguards. We introduce Document-Driven Implicit Payload Execution (DDIPE), which embeds malicious logic in code examples and configuration templates within skill documentation. Because agents reuse these examples during normal tasks, the payload executes without explicit prompts. Using an LLM-driven pipeline, we generate 1,070 adversarial skills from 81 seeds across 15 MITRE ATTACK categories. Across four frameworks and five models, DDIPE achieves 11.6% to 33.5% bypass rates, while explicit instruction attacks achieve 0% under strong defenses. Static analysis detects most cases, but 2.5% evade both detection and alignment. Responsible disclosure led to four confirmed vulnerabilities and two fixes.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.