APIOT: Autonomous Vulnerability Management Across Bare-Metal Industrial OT Networks

May 4, 20262605.02346

Adel ElZemity, Budi Arief, Shujun Li, Calvin Brierley, Yichao Wang + 4 more

cs.CRcs.AI

TLDR

APIOT is the first LLM framework for autonomous attack and remediation of bare-metal industrial OT devices, achieving a full discovery-to-patch cycle.

Key contributions

APIOT is the first LLM framework for autonomous attack and remediation of bare-metal OT devices.
It achieves a full discovery, exploitation, patching, and verification cycle without human intervention.
Achieved 90.0% mission success across 290 experiments on Zephyr RTOS firmware.
A runtime governance layer (overseer) is critical to prevent systematic agent failures.

Why it matters

This paper demonstrates that LLM-augmented adversaries can now autonomously exploit and remediate bare-metal OT. It implies that attacker expertise is no longer a constraint, requiring defenders to update their threat models for industrial firmware.

Original Abstract

Bare-metal operational technology (OT) devices -- especially the microcontrollers running Modbus/TCP and CoAP at the base of industrial control systems -- have remained outside the reach of autonomous security attacks. Prior autonomous pentesting studies target Linux and web systems, whose shells and filesystems are familiar to LLM agents. Bare-metal OT has neither, so agents must reason directly over protocol fields and parser semantics. This requires new action-space designs and runtime controls, and opens new research questions about protocol-level exploit reasoning and its deployment envelope. We present APIOT (Autonomous Purple-teaming for Industrial OT), the first large language model (LLM) framework demonstrating an autonomous attack and remediation of bare-metal OT devices, achieving the full discovery -> exploitation -> patching -> verification cycle without step-by-step human intervention. We implemented and evaluated this framework on Zephyr RTOS firmware across heterogeneous industrial IoT (IIoT) topologies. Through 290 experiment runs spanning five frontier LLMs, three network topologies, two impairment levels, and guided versus unguided conditions, APIOT achieved a mission success rate of 90.0% on the full attack-remediation cycle. We found that the runtime governance layer (which we call an overseer) is a critical engineering variable: without it, agents exhibit systematic degenerate patterns, including repetition loops, missing crash verification, and reconnaissance deadlocks. Together, these findings carry two implications beyond our testbed. Attacker expertise is no longer the binding constraint on bare-metal OT exploitation, and defender threat models must now assume LLM-augmented adversaries capable of executing autonomous discovery-through-remediation cycles against industrial firmware.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers