Untrained CNNs Match Backpropagation at V1: A Systematic RSA Comparison of Four Learning Rules Against Human fMRI
TLDR
Untrained CNNs achieve V1/V2 alignment with human fMRI data comparable to backpropagation, highlighting architecture's dominant role in early visual processing.
Key contributions
- Compared BP, FA, PC, and STDP learning rules in CNNs against human fMRI data using RSA.
- Untrained CNNs achieve V1/V2 alignment statistically indistinguishable from backpropagation.
- Learning rules only differentiate at higher visual areas, with BP dominating LOC/IT.
- Predictive coding (PC) with local Hebbian updates matches BP's IT alignment.
Why it matters
This paper reveals that CNN architecture, not the learning rule, primarily determines early visual cortex alignment (V1/V2). It suggests that the inherent structure of CNNs is sufficient for modeling initial visual processing, while learning rules become crucial for higher-level areas. This shifts understanding of how artificial networks align with biological vision.
Original Abstract
A central question in computational neuroscience is whether the learning rule used to train a neural network determines how well its internal representations align with those of the human visual cortex. We present a systematic comparison of four learning rules -- backpropagation (BP), feedback alignment (FA), predictive coding (PC), and spike-timing-dependent plasticity (STDP) -- applied to identical convolutional architectures and evaluated against human fMRI data from the THINGS-fMRI dataset (720 stimuli, 3 subjects) using Representational Similarity Analysis (RSA). Crucially, we include an untrained random-weights baseline that reveals the dominant role of architecture. We find that early visual alignment (V1/V2) is primarily architecture-driven: an untrained CNN achieves rho = 0.071, statistically indistinguishable from BP (rho = 0.072, p = 0.43). Learning rules only differentiate at higher visual areas: BP dominates at LOC/IT, and PC with local Hebbian updates achieves IT alignment statistically indistinguishable from BP (p = 0.18). FA consistently impairs representations below the random baseline at V1. Partial RSA confirms all effects survive pixel-similarity control. These results demonstrate that the relationship between learning rules and cortical alignment is region-specific: architecture determines early alignment, while supervised objectives drive late alignment.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.