The Causally Emergent Alignment Hypothesis: Causal Emergence Aligns with and Predicts Final Reward in Reinforcement Learning Agents
Federico Pigozzi, Michael Levin
TLDR
This paper proposes the Causally Emergent Alignment Hypothesis, showing that causal emergence in RL agents predicts final reward and aligns with learning.
Key contributions
- Introduces the Causally Emergent Alignment Hypothesis for RL agents.
- Shows causal emergence (CE) in latent spaces predicts final reward early in training.
- Demonstrates CE dynamics align with reward improvement across diverse RL tasks.
- Suggests CE as a novel axis for neural representation reorganization in RL.
Why it matters
This work reveals a fundamental link between causal emergence and successful learning in RL, offering a new perspective on how agents develop effective representations. Understanding this alignment could lead to designing more efficient and robust AI systems.
Original Abstract
A hallmark of life on Earth is the ability of agents to exert causal power and be drivers of subsequent events. This is key to cognition at all scales. Causal emergence, measuring the degree to which an agent exerts unique predictive power on its future, is one consequence of causal power. Indeed, recent discoveries have shown that biological agents, even minimal ones, increase their causal emergence after learning new memories. However, there is a major knowledge gap regarding how causally emergent artificial agents are. We focused on Reinforcement Learning (RL) of neural-network agents across an array of environmental conditions, encompassing different algorithms, agent architectures, and six environments arranged on a complexity spectrum. For consistency, we computed the causal emergence of their latent-space representations over their lifetimes. We used the recently proposed ΦID to estimate causal emergence and tested how it related to learning performance. Our results suggested a Causally Emergent Alignment Hypothesis: successful agents exhibited causal emergence that was consistently predictive of final reward early in training and whose representational dynamics aligned with reward improvement in most tasks. This idea suggests that causal emergence may be a previously undisclosed axis of reorganization of neural representations in RL agents, with the potential to establish causal relationships and interventions that will lead to better RL agents. Our work also highlights the alignment between causal emergence and learning as another way biological and artificial creatures compare.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.