Cortex-Inspired Continual Learning: Unsupervised Instantiation and Recovery of Functional Task Networks

April 27, 20262604.24637

Kevin McKee, Thomas Hazy, Yicong Zheng, Zacharie Bugaud, Thomas Miconi

cs.LGcs.AIq-bio.NC

TLDR

Cortex-inspired Functional Task Networks (FTN) use self-organizing binary masks to prevent catastrophic forgetting and enable unsupervised task recovery in continual learning.

Key contributions

Introduces Functional Task Networks (FTN), a cortex-inspired parameter-isolation method for continual learning.
Uses a three-stage process (gradient descent, smoothing, k-winner-take-all) to create self-organizing binary masks.
Ensures structural protection against catastrophic forgetting via disjoint gradient updates for different tasks.
Enables unsupervised task segmentation and recovery of prior solutions in a single gradient step.

Why it matters

This paper introduces a novel, biologically-inspired approach to continual learning, effectively tackling catastrophic forgetting and unsupervised task inference. Its efficient parameter-isolation method, FTN, demonstrates strong performance on benchmarks, offering a promising direction for robust AI systems.

Original Abstract

Block-sequential continual learning demands that a single model both protect prior solutions from catastrophic forgetting and efficiently infer at inference time which prior solution matches the current input without task labels. We present Functional Task Networks (FTN), a parameter-isolation method inspired by structural and dynamical motifs found in the mammalian neocortex. Similar to mixture-of-experts, this method uses a high dimensional, self-organizing binary mask over a large population of small but deep networks, inspired by dendritic models of pyramidal neurons. The mask is produced by a three-stage procedure: (1) gradient descent on a continuous mask identifies task-relevant neurons, (2) a smoothing kernel biases the result toward spatial contiguity, (3) and k-winner-take-all binarizes the resulting group at a fixed capacity budget. Like mixture-of-experts, each neuron is an independent deep network, so disjoint masks give exactly disjoint gradient updates, providing structural guarantees against catastrophic forgetting. This three-stage procedure recovers the sub-network of a previously-trained task in a single gradient step, providing unsupervised task segmentation at inference time. We test it on three continual-learning benchmarks: (1) a synthetic multi-task classification/regression generator, (2) MNIST with shuffled class labels (pure concept shift), and (3) Permuted MNIST (domain shift). On all three, FTN with fine grained smoothing (FTN-Slow) results in nearly zero forgetting. FTN with a large kernel and only 2 iterations of smoothing (FTN-Fast) trades off some retention for increased speed. We show that the spatial organization mechanism reduces the effective mask search from the combinatorial top-k subset problem in O(C(H,K)) to the complexity of a near-linear scan in O(H) over compact cortical neighborhoods, which is parallelized by the gradient-based update.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers