From Citation Selection to Citation Absorption: A Measurement Framework for Generative Engine Optimization Across AI Search Platforms
TLDR
A new framework measures Generative Engine Optimization (GEO) by analyzing how AI search platforms select and absorb information from cited sources.
Key contributions
- Introduces a two-stage framework for Generative Engine Optimization (GEO): selection and absorption.
- Analyzes 602 prompts across ChatGPT, Google, and Perplexity, using 21k+ citations.
- Reveals divergence in citation breadth (more sources) vs. depth (higher influence) across platforms.
- Identifies high-influence pages as longer, structured, aligned, and rich in extractable evidence.
Why it matters
This paper redefines Generative Engine Optimization (GEO) by introducing a framework that measures not just citation counts but also the absorption of information into AI-generated answers. It offers critical insights for content creators and SEO specialists on how to optimize for the evolving landscape of AI search.
Original Abstract
Generative search engines increasingly determine whether online information is merely discoverable, cited as a source, or actually absorbed into generated answers. This paper proposes a two-stage measurement framework for Generative Engine Optimization (GEO): citation selection, where a platform triggers search and chooses sources, and citation absorption, where a cited page contributes language, evidence, structure, or factual support to the final answer. We analyze the public geo-citation-lab dataset covering 602 controlled prompts across ChatGPT, Google AI Overview/Gemini, and Perplexity; 21,143 valid search-layer citations; 23,745 citation-level feature records; 18,151 successfully fetched pages; and 72 extracted features. The central descriptive finding is that citation breadth and citation depth diverge. Perplexity and Google cite more sources on average, while ChatGPT cites fewer sources but shows substantially higher average citation influence among fetched pages. High-influence pages tend to be longer, more structured, semantically aligned, and richer in extractable evidence such as definitions, numerical facts, comparisons, and procedural steps. The results suggest that GEO should be measured beyond citation counts, with answer-level absorption treated as a separate outcome.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.