ArXiv TLDR

Structured Security Auditing and Robustness Enhancement for Untrusted Agent Skills

🐦 Tweet
2604.25109

Lijia Lv, Xuehai Tang, Jie Wen, Jizhong Han, Songlin Hu

cs.CRcs.AI

TLDR

This paper presents SkillGuard-Robust, a novel system for robustly auditing untrusted Agent Skills against malicious intent with high accuracy.

Key contributions

  • Formulates pre-load auditing for untrusted Agent Skills as a robust three-way classification task.
  • Introduces SkillGuard-Robust, combining role-aware evidence extraction, semantic verification, and adjudication.
  • Achieves 97.30% overall exact match and 98.33% malicious-risk recall on a 404-package dataset.
  • Demonstrates improved robustness for frozen and public-ecosystem Agent Skills through factorized auditing.

Why it matters

As AI agents become more prevalent, ensuring the security of their 'skills' is paramount. This paper addresses a critical gap in robustly auditing untrusted agent capabilities, preventing malicious code execution. Its high-accuracy system significantly enhances the trustworthiness of agent ecosystems.

Original Abstract

Agent Skills package SKILL.md files, scripts, reference documents, and repository context into reusable capability units, turning pre-load auditing from single-prompt filtering into cross-file security review. Existing guardrails often flag risk but recover malicious intent inconsistently under semantics-preserving rewrites. This paper formulates pre-load auditing for untrusted Agent Skills as a robust three-way classification task and introduces SkillGuard-Robust, which combines role-aware evidence extraction, selective semantic verification, and consistency-preserving adjudication. We evaluate SkillGuard-Robust on SkillGuardBench and two public-ecosystem extensions through five large evaluation views ranging from 254 to 404 packages. On the 404-package held-out aggregate, SkillGuard-Robust reaches 97.30% overall exact match, 98.33% malicious-risk recall, and 98.89% attack exact consistency. On the 254-package external-ecosystem view, it reaches 99.66%, 100.00%, and 100.00%, respectively. These results support a bounded conclusion: factorized package auditing materially improves frozen and public-ecosystem robustness, while harsher external-source transfer remains an open challenge.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.