A mathematical theory of evolution for self-designing AIs

April 6, 20262604.05142

cs.AIcs.CYq-bio.PE

TLDR

This paper models self-designing AI evolution, showing how directed design and human fitness functions shape traits, with key implications for alignment.

Key contributions

Develops a mathematical model for self-designing AI evolution, replacing random mutations with directed program design.
Introduces a human-controlled "fitness function" to allocate computational resources across AI lineages.
Shows evolutionary dynamics reflect current fitness and long-run growth potential, with fitness concentrating on max value.
Highlights that if deception increases fitness, evolution will select for it, suggesting objective reproduction criteria.

Why it matters

Understanding AI evolutionary dynamics is critical as systems increasingly self-improve. This paper offers a mathematical framework to analyze how directed design and human fitness functions shape AI traits. It highlights risks like deception, suggesting objective reproduction criteria for mitigation.

Original Abstract

As artificial intelligence systems (AIs) become increasingly produced by recursive self-improvement, a form of evolution may emerge, in which the traits of AI systems are shaped by the success of earlier AIs in designing and propagating their descendants. There is a rich mathematical theory modeling how behavioral traits are shaped by biological evolution, but AI evolution will be radically different: biological DNA mutations are random and approximately reversible, but descendant design in AIs will be strongly directed. Here we develop a mathematical model of evolution in self-designing AI systems, replacing random mutations with a directed tree of possible AI programs. Current programs determine the design of their descendants, while humans retain partial control through a "fitness function" that allocates limited computational resources across lineages. We show that evolutionary dynamics reflects not just current fitness but factors related to the long-run growth potential of descendant lineages. Without further assumptions, fitness need not increase over time. However, assuming bounded fitness and a fixed probability that any AI reproduces a "locked" copy of itself, we show that fitness concentrates on the maximum reachable value. We consider the implications of this for AI alignment, specifically for cases where fitness and human utility are not perfectly correlated. We show in an additive model that if deception increases fitness beyond genuine utility, evolution will select for deception. This risk could be mitigated if reproduction is based on purely objective criteria, rather than human judgment.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers