Operational Feature Fingerprints of Graph Datasets via a White-Box Signal-Subspace Probe
Yuchen Xiong, Swee Keong Yeap, Zhen Hong Ban
TLDR
WG-SRC is a white-box signal-subspace probe that diagnoses graph datasets by decomposing their operational feature fingerprints, offering insights into GNN mechanisms.
Key contributions
- Introduces WG-SRC, a white-box signal-subspace probe for graph dataset diagnosis and prediction.
- Replaces GNN message passing with a fixed dictionary of raw, low-pass, and high-pass graph signals.
- Decomposes graph behavior into operational feature fingerprints (raw, low-pass, high-pass, class-geometric).
- Provides dataset-specific operational feature fingerprints for post-evaluation guidance and mechanistic interventions.
Why it matters
Graph Neural Networks are often black boxes. This paper introduces WG-SRC, a white-box tool that reveals the underlying feature-level mechanisms of graph datasets, explaining GNN behavior. This transparency enables targeted improvements and dataset-specific modifications for better model design.
Original Abstract
Graph neural networks achieve strong node-classification accuracy, but their learned message passing entangles ego attributes, neighborhood smoothing, high-pass graph differences, class geometry, and classifier boundaries in an opaque representation. This obscures why a node is classified and what feature-level graph-learning mechanisms a dataset requires. We propose WG-SRC, a white-box signal-subspace probe for prediction and graph dataset diagnosis. WG-SRC replaces learned message passing with a fixed, named graph-signal dictionary of raw features, row-normalized and symmetric-normalized low-pass propagation, and high-pass graph differences. It combines Fisher coordinate selection, class-wise PCA subspaces, closed-form multi-alpha ridge classification, and validation-based score fusion, so prediction and analysis use explicit class subspaces, energy-controlled dimensions, and closed-form linear decisions. As a white-box graph-learning instrument, WG-SRC uses predictive performance to validate its diagnostics: across six node-classification datasets, the scaffold remains competitive with reproduced graph baselines and achieves positive average gain under aligned splits. Its atlas, produced by a predictor, decomposes behavior into raw-feature, low-pass, high-pass, class-geometric, and ridge-boundary components. These operational feature fingerprints distinguish low-pass-dominated Amazon graphs, mixed high-pass and class-geometrically complex Chameleon behavior, and raw- or boundary-sensitive WebKB graphs. As intrinsic classifier outputs rather than post-hoc explanations, these fingerprints provide post-evaluation guidance for later analysis and dataset-specific modification. Aligned mechanistic interventions support this guidance by indicating when high-pass blocks act as removable noise, when raw features should be preserved, and when ridge-type boundary correction matters.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.