Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners

May 8, 20262605.08019

Botos Csaba, Sreejan Kumar, Austin Tudor David Andrews, Laurence Hunt, Chris Summerfield + 4 more

cs.AIq-bio.NC

TLDR

Frontier Large Reasoning Models (LRMs) align with human game learning behavior and brain activity, outperforming deep reinforcement learning.

Key contributions

Compared LRMs, RL, and Bayesian agents learning novel video games with human fMRI data.
Frontier LRMs most closely matched human behavioral patterns during game discovery.
LRMs predicted human brain activity an order of magnitude better than RL agents.
Brain alignment reflects LRM's in-context game state representation, not planning.

Why it matters

This paper establishes frontier Large Reasoning Models (LRMs) as compelling computational accounts of human learning and decision-making in complex, naturalistic environments. It provides neuroscientific evidence for LRMs' ability to capture human-like cognitive processes, advancing our understanding of AI-human alignment.

Original Abstract

Humans rapidly learn abstract knowledge when encountering novel environments and flexibly deploy this knowledge to guide efficient and intelligent action. Can modern AI systems learn and plan in a similar way? We study this question using a dataset of complex human gameplay with concurrent fMRI recordings, in which participants learn novel video games that require rule discovery, hypothesis revision, and multi-step planning. We jointly evaluate models by their ability to play the games, match human learning behavior, and predict brain activity during the same task, comparing a suite of frontier Large Reasoning Models (LRMs) against model-free and model-based deep reinforcement learning agents and a Bayesian theory-based agent. We find that frontier LRMs most closely match human behavioral patterns during game discovery and predict brain activity an order of magnitude better than both reinforcement learning alternatives across cortical and subcortical regions, with effects robust to permutation controls. Through targeted manipulations, we further show that brain alignment reflects the model's in-context representation of the game state rather than its downstream planning or reasoning. Our results establish LRMs as compelling computational accounts of human learning and decision making in complex, naturalistic environments. Project page with interactive replays: https://botcs.github.io/reason-to-play/

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers