NeuroTrace: Inference Provenance-Based Detection of Adversarial Examples

April 15, 20262604.14457

Firas Ben Hmida, Philemon Hailemariam, Kashif Ali Khan, Birhanu Eshete

cs.CR

TLDR

NeuroTrace introduces Inference Provenance Graphs to detect adversarial examples by analyzing cross-layer information flow in DNNs, outperforming prior methods.

Key contributions

NeuroTrace framework uses Inference Provenance Graphs (IPGs) to analyze DNN inference behavior.
IPGs capture activation behavior and dataflow, offering a structured representation of information propagation.
IPG-based detectors show strong, transferable performance in detecting adversarial examples, outperforming baselines.
The framework includes an open dataset, reproducible extraction pipeline, and benchmark suite.

Why it matters

This paper tackles DNN opacity by introducing NeuroTrace, a novel framework for detecting adversarial examples using inference provenance. It provides a strong, transferable signal for identifying malicious inputs, laying a foundation for more transparent and auditable ML systems.

Original Abstract

Deep neural networks (DNNs) remain largely opaque at inference time, limiting our ability to detect and diagnose malicious input manipulations such as adversarial examples. Existing detection methods predominantly rely on layer-local signals (e.g., activations or attribution scores), leaving cross-layer information flow and execution structure under-explored. We introduce NeuroTrace, a framework and open dataset for analyzing inference provenance through Inference Provenance Graphs (IPGs). IPGs are heterogeneous graphs that capture both activation behavior and parameter-induced dataflow during a model's forward pass, providing a structured representation of how information propagates through the network. NeuroTrace includes (i) a reproducible extraction engine that instruments model execution, (ii) a standardized graph representation compatible with heterogeneous GNNs, and (iii) a benchmark suite spanning multiple adversarial attack families across vision and malware domains. Using this framework, we evaluate IPG-based detectors for adversarial example detection under intra-attack, multi-attack, and cross-threat transfer settings. Our results show that inference provenance provides a strong and transferable signal for distinguishing adversarial and benign inputs, achieving consistently high detection performance and improving over prior graph-based baselines. We further analyze the conditions under which provenance-based detection generalizes across attack types, as well as the associated runtime and storage trade-offs. By releasing the dataset, extraction pipeline, and evaluation protocol, NeuroTrace enables systematic study of inference-time behavior and establishes inference provenance as a practical foundation for building more transparent and auditable machine learning systems.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers