ArXiv TLDR

Agentic Vulnerability Reasoning on Windows COM Binaries

🐦 Tweet
2605.05000

Hwiwon Lee, Jongseong Kim, Lingming Zhang

cs.CRcs.LG

TLDR

SLYP is an agentic pipeline that finds race condition vulnerabilities in Windows COM binaries and generates verified proof-of-concept exploits.

Key contributions

  • SLYP is an agentic pipeline for discovering race conditions in Windows COM binaries.
  • It generates debugger-verified proof-of-concept (PoC) code for these vulnerabilities.
  • Achieved 0.973 F1, outperforming other agents and static analyzers in bug discovery.
  • Discovered 28 unknown vulnerabilities in production Windows services, earning 16 CVEs and $140,000.

Why it matters

Windows COM services are a critical attack surface for privilege escalation. SLYP provides an automated, highly effective solution for finding and exploiting race conditions in these binaries. Its success in discovering 28 unknown vulnerabilities and earning significant bounties highlights its practical importance for cybersecurity.

Original Abstract

Windows Component Object Model (COM) services run with elevated privileges and are widely accessible to authenticated users, making race conditions in these binaries a critical surface for local privilege escalation. We present SLYP, an end-to-end agentic pipeline that discovers race condition vulnerabilities in COM binaries and generates debugger-verified proof-of-concept (PoC) code. SLYP exposes binary exploration, COM inspection, and dynamic debugging as reusable tool interfaces, giving agents the static context, COM activation metadata, and debugger feedback needed to move from vulnerability discovery to verified PoC generation. On a benchmark of 20 COM objects covering 40 vulnerability cases, SLYP achieves 0.973 F1, outperforming production coding agents by up to 0.208 F1 and the state-of-the-art static analyzer by 3.3x in bug discovery. For PoC generation, production coding agents in their default setup (without our COM inspection and dynamic debugging tools) verify essentially no cases on either frontier model, whereas SLYP's interactive toolsets enable it to autonomously synthesize working PoCs for 67.5% of cases on the strongest configuration. Deployed on production Windows services, SLYP discovers 28 previously unknown vulnerabilities across nine COM services, all confirmed by the Microsoft Security Response Center (MSRC) with 16 CVEs assigned and $140,000 in bounties. Furthermore, SLYP is designed with generalizable binary analysis and debugging interfaces, making it readily applicable to other commercial off-the-shelf (COTS) binaries beyond Windows COM services.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.