Dissociating spatial frequency reliance from adversarial robustness advantages in neurally guided deep convolutional neural networks

May 6, 20262605.04443

Zhenan Shao, Tianyu Ren, Chengxiao Wang, Leyla Isik, Diane M. Beck

q-bio.NCcs.AI

TLDR

Neurally aligned DCNNs' adversarial robustness isn't primarily driven by spatial frequency reliance, but by learning more human-like representations.

Key contributions

Neurally aligned DCNNs increase reliance on low spatial frequencies (LSF) and the "human channel."
Directly biasing models towards the "human channel" impairs adversarial robustness.
LSF bias yields only modest robustness gains despite large spatial frequency shifts.
Spatial-frequency-biased models show minimal similarity to human neural representational geometry.

Why it matters

This paper challenges the assumption that spatial frequency reliance is the primary driver of adversarial robustness in neurally aligned DCNNs. It suggests that such reliance is an emergent property, motivating new research directions beyond spectral profiles to understand robust visual processing.

Original Abstract

Deep convolutional neural networks (DCNNs) have rivaled humans on many visual tasks, yet they remain vulnerable to near-imperceptible perturbations generated by adversarial attacks. Recent work shows that aligning DCNN representations with human visual cortex activity improves adversarial robustness, but the mechanisms driving this advantage are unclear. One hypothesis suggests that neural alignment confers robustness by biasing models away from brittle high-frequency details and towards the low spatial frequencies (LSF). However, recent work shows that human object recognition critically depends on a narrow, mid-frequency "human channel". Interestingly, this band was partially preserved in prior LSF-focused studies. Here, we investigate whether a spectral bias towards the LSF or the human channel is the primary driver of the adversarial robustness observed in neurally aligned DCNNs. We first show that DCNNs aligned to higher-order regions of the human ventral visual stream systematically increase reliance on both LSF and the human channel. However, directly steering DCNNs towards these bands revealed a clear dissociation. Biasing models towards the human channel, either alone or together with LSF, does not improve robustness and even impairs it. LSF bias produced some robustness gains, but such improvements are modest despite inducing much larger shifts in spatial-frequency reliance than neurally aligned models. Spatial-frequency-biased models overall show little, if any, increase in similarity to human neural representational geometry. Together, our results suggest that altered spatial-frequency reliance is likely an emergent property of learning more human-like representations rather than the primary mechanism by which neural alignment confers adversarial robustness, and motivate the need for future research examining representational properties beyond spatial-frequency profiles.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers