ArXiv TLDR

Heterogeneous Connectivity in Sparse Networks: Fan-in Profiles, Gradient Hierarchy, and Topological Equilibria

🐦 Tweet
2604.10560

Nikodem Tomczak

cs.LGcs.NE

TLDR

Heterogeneous connectivity in sparse networks (PSN) doesn't improve accuracy with arbitrary hub placement, but optimizing hub placement boosts dynamic sparse training.

Key contributions

  • PSN replaces uniform connectivity with deterministic, heterogeneous fan-in profiles.
  • Static PSN with arbitrary hub placement offers no accuracy benefit over uniform sparsity.
  • Structured profiles concentrate gradients 2-5x at hub neurons, correlating with fan-in variation.
  • Lognormal PSN initialisation improves RigL dynamic sparse training, outperforming ERK on harder tasks.

Why it matters

This research clarifies the role of heterogeneous connectivity in sparse neural networks. It shows that simply having varied connectivity isn't enough; the optimization-driven placement of highly connected "hub" neurons is crucial for performance gains, especially in dynamic sparse training. This guides future sparse network design.

Original Abstract

Profiled Sparse Networks (PSN) replace uniform connectivity with deterministic, heterogeneous fan-in profiles defined by continuous, nonlinear functions, creating neurons with both dense and sparse receptive fields. We benchmark PSN across four classification datasets spanning vision and tabular domains, input dimensions from 54 to 784, and network depths of 2--3 hidden layers. At 90% sparsity, all static profiles, including the uniform random baseline, achieve accuracy within 0.2-0.6% of dense baselines on every dataset, demonstrating that heterogeneous connectivity provides no accuracy advantage when hub placement is arbitrary rather than task-aligned. This result holds across sparsity levels (80-99.9%), profile shapes (eight parametric families, lognormal, and power-law), and fan-in coefficients of variation from 0 to 2.5. Internal gradient analysis reveals that structured profiles create a 2-5x gradient concentration at hub neurons compared to the ~1x uniform distribution in random baselines, with the hierarchy strength predicted by fan-in coefficient of variation ($r = 0.93$). When PSN fan-in distributions are used to initialise RigL dynamic sparse training, lognormal profiles matched to the equilibrium fan-in distribution consistently outperform standard ERK initialisation, with advantages growing on harder tasks, achieving +0.16% on Fashion-MNIST ($p = 0.036$, $d = 1.07$), +0.43% on EMNIST, and +0.49% on Forest Cover. RigL converges to a characteristic fan-in distribution regardless of initialisation. Starting at this equilibrium allows the optimiser to refine weights rather than rearrange topology. Which neurons become hubs matters more than the degree of connectivity variance, i.e., random hub placement provides no advantage, while optimisation-driven placement does.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.