From Relevance to Authority: Authority-aware Generative Retrieval in Web Search Engines

April 15, 20262604.13468

Sunkyung Lee, Jihye Back, Donghyeon Jeon, Soonhwan Kwon, Moonkwon Kim + 2 more

cs.IRcs.CL

TLDR

AuthGR introduces authority-aware generative retrieval for web search, improving trustworthiness and accuracy in high-stakes domains.

Key contributions

AuthGR is the first framework to integrate document authority into generative information retrieval.
Employs multimodal authority scoring using a vision-language model for trustworthiness.
Utilizes a three-stage training pipeline to progressively instill authority awareness.
Achieves significant improvements in user engagement and reliability in online A/B tests.

Why it matters

This paper addresses a critical gap in generative retrieval by integrating document authority, ensuring trustworthiness alongside relevance. It's crucial for high-stakes domains like healthcare and finance. The real-world online deployment and positive user engagement demonstrate its practical impact.

Original Abstract

Generative information retrieval (GenIR) formulates the retrieval process as a text-to-text generation task, leveraging the vast knowledge of large language models. However, existing works primarily optimize for relevance while often overlooking document trustworthiness. This is critical in high-stakes domains like healthcare and finance, where relying solely on semantic relevance risks retrieving unreliable information. To address this, we propose an Authority-aware Generative Retriever (AuthGR), the first framework that incorporates authority into GenIR. AuthGR consists of three key components: (i) Multimodal Authority Scoring, which employs a vision-language model to quantify authority from textual and visual cues; (ii) a Three-stage Training Pipeline to progressively instill authority awareness into the retriever; and (iii) a Hybrid Ensemble Pipeline for robust deployment. Offline evaluations demonstrate that AuthGR successfully enhances both authority and accuracy, with our 3B model matching a 14B baseline. Crucially, large-scale online A/B tests and human evaluations conducted on the commercial web search platform confirm significant improvements in real-world user engagement and reliability.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers