ArXiv TLDR

Diffusion-Guided Feature Selection via Nishimori Temperature: Noise-Based Spectral Embedding

🐦 Tweet
2604.24692

Vasiliy S. Usatyuk, Denis A. Sapozhnikov, Sergey I. Egorov

cs.LG

TLDR

NBSE uses a physics-informed spectral embedding and Nishimori temperature to robustly select informative features from high-dimensional data.

Key contributions

  • Introduces Noise-Based Spectral Embedding (NBSE) for non-greedy, physics-informed feature selection.
  • Uses Nishimori temperature to identify dominant diffusion modes, preventing hub dominance in graphs.
  • Applies spectral embedding in feature space to group redundant dimensions for representative selection.
  • Demonstrates robustness to noise and superior performance in retaining accuracy with fewer features.

Why it matters

This paper offers a novel, physics-informed method for feature selection in high-dimensional data, avoiding computationally expensive greedy searches. By leveraging Nishimori temperature and spectral embedding, it robustly identifies and selects the most informative features. This significantly improves efficiency and maintains high accuracy, outperforming existing methods.

Original Abstract

We propose Noise-Based Spectral Embedding (NBSE), a physics-informed framework for selecting informative features from high-dimensional data without greedy search. NBSE constructs a sparse similarity graph on the samples and identifies the Nishimori temperature $β_N$ the critical inverse temperature at which the Bethe Hessian becomes singular. The corresponding smallest eigenvector captures the dominant mode of an intrinsically degree-corrected diffusion process, naturally reweighting nodes to prevent hub dominance. By transposing the data matrix and applying NBSE in feature space, we obtain a one-dimensional spectral embedding that reveals groups of redundant or semantically related dimensions; balanced binning then selects one representative per group. We prove that coloured Gaussian perturbations shift $β_N$ by at most $O(\barσ^2)$, guaranteeing robustness to measurement noise. Experiments on ImageNet embeddings from MobileNetV2 and EfficientNet-B4 show that NBSE preserves classification accuracy even under aggressive compression: on EfficientNet-B4 the accuracy drop is below $1\%$ when retaining only $30\%$ of features, outperforming ANOVA $F$-test and random selection by up to $6.8\%$.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.