ArXiv TLDR

The Generalized Turing Test: A Foundation for Comparing Intelligence

🐦 Tweet
2605.10851

Daniel Mitropolsky, Susan S. Hong, Riccardo Neumarker, Emanuele Rimoldi, Tomaso Poggio

cs.AIcs.CLcs.LG

TLDR

The Generalized Turing Test (GTT) offers a formal, dataset-agnostic framework to compare AI agent intelligence via indistinguishability.

Key contributions

  • Introduces the Generalized Turing Test (GTT), a formal framework for comparing agent capabilities via indistinguishability.
  • Defines A ≥ B if agent B cannot reliably distinguish between interactions with A (imitating B) and B itself.
  • Provides a dataset- and task-agnostic measure of relative intelligence, studied for transitivity and variants.
  • Empirical evaluation on modern models yields stratified rankings consistent with existing intelligence benchmarks.

Why it matters

This paper introduces a foundational, dataset-agnostic method for comparing AI intelligence, moving beyond traditional benchmarks. It offers a unifying lens for evaluation and potentially new training objectives that are inherently independent of fixed datasets.

Original Abstract

We introduce the Generalized Turing Test (GTT), a formal framework for comparing the capabilities of arbitrary agents via indistinguishability. For agents A and B, we define the Turing comparator A $\geq$ B to hold if B, acting as a distinguisher, cannot reliably distinguish between interactions with A (instructed to imitate B) and another instance of B. This yields a dataset- and task-agnostic notion of relative intelligence. We study the comparator's structure, including conditions under which it is transitive and therefore induces an ordering over equivalence classes, and we define and analyze variants with querying, bounded interaction, and fixed distinguishers. To complement the theory, we instantiate the framework on a collection of modern models, empirically evaluating pairwise indistinguishability across thousands of trials. The resulting comparisons exhibit a stratified structure consistent with existing rankings, hinting that the proposed framework yields meaningful empirical orderings. Our results position indistinguishability as a unifying lens for reasoning about intelligence, suggesting a foundation for evaluation and, potentially, training objectives that are inherently independent of fixed datasets or benchmarks.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.