When Does Structure Matter in Continual Learning? Dimensionality Controls When Modularity Shapes Representational Geometry
Kathrin Korte, Joachim Winter Pedersen, Eleni Nisioti, Sebastian Risi
TLDR
This paper shows representational dimensionality dictates when modular architectures benefit continual learning, especially in low-dimensional regimes.
Key contributions
- Compares modular vs. single-module networks in sequential tasks with varying similarity.
- Finds architecture matters little in high-dimensional regimes where representations are unconstrained.
- Shows modularity is crucial in low-dimensional regimes, enabling adaptive, graded subspace geometry.
- Identifies representational dimensionality as the key factor for when structural separation is functional.
Why it matters
This paper clarifies when modular architectures are beneficial in continual learning, introducing representational dimensionality as a critical organizing variable. It offers a new principle for designing more effective continual learning systems by understanding how structure and dimensionality interact to balance plasticity and stability.
Original Abstract
To preserve previously learned representations, continual learning systems must strike a balance between plasticity, the ability to acquire new knowledge, and stability. This stability-plasticity dilemma affects how representations can be reused across tasks: shared structure enables transfer when tasks are similar but may also induce interference when new learning disrupts existing representations. However, it remains unclear when and why structural separation influences this trade-off. In this study, we examine how network architecture, task similarity, and representational dimensionality jointly shape learning in a sequential task paradigm inspired by transfer-interference studies. We compare a task-partitioned modular recurrent network with a single-module baseline by systematically varying task similarity (low, medium, high) and the scale of weight initialization, which induces different learning regimes that we empirically characterize through the effective dimensionality of the learned representations. We find that architecture has minimal impact in high-dimensional regimes where representations are sufficiently unconstrained to accommodate multiple tasks without strong interference. In contrast, in lower-dimensional (rich) regimes, architectural separation is decisive: modular networks exhibit graded alignment of task-specific subspaces with overlap for similar tasks, partial orthogonalization for moderately dissimilar tasks, and stronger separation for dissimilar tasks. This graded geometry is absent in the single network baseline. Our findings suggest that representational dimensionality acts as a key organizing variable governing when structural separation becomes functionally relevant, and highlight adaptive geometry as a central principle for designing continual learning systems.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.