How Much is Brain Data Worth for Machine Learning?
Lane Lewis, Zhixin Wang, David Schwab, Xaq Pitkow
TLDR
This paper quantifies the value of brain data for machine learning, deriving scaling laws and exchange rates between brain and task samples.
Key contributions
- Formulates brain data value mathematically with a linear Gaussian model.
- Derives scaling laws for ML performance based on brain and task sample sizes.
- Quantifies relative value and exchange rates between brain and task samples.
- Identifies conditions for robustness gains from brain-regularized learning.
Why it matters
NeuroAI explores using neural data to improve ML, but its practical value is often unclear. This paper provides a theoretical framework to quantify when and how much brain data benefits ML models, offering foundational insights for optimizing data collection strategies in NeuroAI.
Original Abstract
If a person can solve a task, can measuring their brain make it easier to train a model to solve that task too? Recent NeuroAI work suggests that supplementing task training with neural recordings can modestly improve model performance and robustness. However, it is unclear when there should be a benefit from using neural data and how much benefit to expect. We formulate this question mathematically, and begin to address it theoretically using a simple, analytically tractable linear gaussian model of task targets and neural recordings. For a multimodal estimator trained on both brain data and task labels, we derive scaling laws for how performance scales with the numbers of brain and task samples. From these laws we derive relative value and exchange rates between brain samples and task samples, quantifying how much extra task samples neural data is worth as a function of task-brain alignment, neural and task noise, latent dimension, and brain data sample size. We also analyze test distribution shift, to identify conditions where brain-regularized learning can produce substantial robustness gains through learned invariances. Finally, under a fixed collection budget, we characterize the regimes in which brain data is worth collecting. Our results provide a foundation for understanding how valuable brain data could be for improving machine learning.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.