When Does Structure Matter in Continual Learning? Dimensionality Controls When Modularity Shapes Representational Geometry

April 30, 20262604.27656

Kathrin Korte, Joachim Winter Pedersen, Eleni Nisioti, Sebastian Risi

cs.LGcs.AIcs.NE

TLDR

This paper shows representational dimensionality dictates when modular architectures benefit continual learning, especially in low-dimensional regimes.

Key contributions

Compares modular vs. single-module networks in sequential tasks with varying similarity.
Finds architecture matters little in high-dimensional regimes where representations are unconstrained.
Shows modularity is crucial in low-dimensional regimes, enabling adaptive, graded subspace geometry.
Identifies representational dimensionality as the key factor for when structural separation is functional.

Why it matters

This paper clarifies when modular architectures are beneficial in continual learning, introducing representational dimensionality as a critical organizing variable. It offers a new principle for designing more effective continual learning systems by understanding how structure and dimensionality interact to balance plasticity and stability.

Original Abstract

To preserve previously learned representations, continual learning systems must strike a balance between plasticity, the ability to acquire new knowledge, and stability. This stability-plasticity dilemma affects how representations can be reused across tasks: shared structure enables transfer when tasks are similar but may also induce interference when new learning disrupts existing representations. However, it remains unclear when and why structural separation influences this trade-off. In this study, we examine how network architecture, task similarity, and representational dimensionality jointly shape learning in a sequential task paradigm inspired by transfer-interference studies. We compare a task-partitioned modular recurrent network with a single-module baseline by systematically varying task similarity (low, medium, high) and the scale of weight initialization, which induces different learning regimes that we empirically characterize through the effective dimensionality of the learned representations. We find that architecture has minimal impact in high-dimensional regimes where representations are sufficiently unconstrained to accommodate multiple tasks without strong interference. In contrast, in lower-dimensional (rich) regimes, architectural separation is decisive: modular networks exhibit graded alignment of task-specific subspaces with overlap for similar tasks, partial orthogonalization for moderately dissimilar tasks, and stronger separation for dissimilar tasks. This graded geometry is absent in the single network baseline. Our findings suggest that representational dimensionality acts as a key organizing variable governing when structural separation becomes functionally relevant, and highlight adaptive geometry as a central principle for designing continual learning systems.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers