EvoPatch-IoT: Evolution-Aware Cross-Architecture Vulnerability Retrieval and Patch-State Profiling for BusyBox-Based IoT Firmware
Yinhao Xiao, Huixi Li, Yongluo Shen
TLDR
EvoPatch-IoT enables evolution-aware cross-architecture vulnerability retrieval and patch-state profiling for stripped BusyBox IoT firmware, outperforming baselines.
Key contributions
- Proposes EvoPatch-IoT, a framework for cross-architecture vulnerability retrieval in stripped BusyBox binaries.
- Combines anonymous instruction/context features, graph statistics, and historical function prototypes for localization.
- Achieves 34.56% Hit@1 and 56.24% Hit@10, outperforming baselines by 16.04% and 26.85%.
- Reduces expected manual inspection space by 98.98% and shows high accuracy in CVE patch-state proxying.
Why it matters
Security assessment of BusyBox in IoT firmware is challenging due to stripped binaries and diverse architectures. EvoPatch-IoT offers a practical, scalable framework for vulnerability auditing, significantly improving the ability to identify and patch security flaws in widely deployed IoT devices.
Original Abstract
BusyBox is one of the most widely reused userland components in Linux-based Internet-of-Things (IoT) firmware, yet its security assessment remains difficult because firmware images are frequently stripped, vendor patch practices are inconsistent, and the same source component is compiled for heterogeneous architectures. We propose EvoPatch-IoT, an evolution-aware cross-architecture retrieval framework for stripped BusyBox firmware binaries. EvoPatch-IoT combines anonymous instruction/context features, graph-level statistics, per-binary geometric priors, and historical function prototypes to localize homologous and potentially vulnerable functions without relying on symbols, source paths, or version strings at test time. We further construct a large-scale BusyBox benchmark from 57 historical versions, 270 unstripped binaries, 285 stripped binaries, and 130 source releases, yielding 1,550,752 function-symbol rows, 1,290,369 analysis-function rows, and 155,845 high-confidence stripped-to-unstripped matches. On 57 fully covered versions and 1,020 directed architecture pairs, EvoPatch-IoT achieves a weighted Hit@1 of 34.56\% and Hit@10 of 56.24\%, outperforming the strongest baseline by 16.04\% and 26.85\%, respectively, and reducing the expected manual inspection space by 98.98\%. The method is best on 56 of 57 versions and maintains consistent advantages on difficult architecture pairs. In addition, a version-change transfer study reaches a mean ROC-AUC of 0.9887, and a CVE-2021-42386 patch-state proxy obtains 82.44\% mean accuracy and 88.47\% mean F1 across held-out architectures. These results show that evolution-aware binary retrieval is a practical foundation for scalable IoT firmware vulnerability auditing.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.