From Relevance to Authority: Authority-aware Generative Retrieval in Web Search Engines
Sunkyung Lee, Jihye Back, Donghyeon Jeon, Soonhwan Kwon, Moonkwon Kim + 2 more
TLDR
AuthGR introduces authority-aware generative retrieval for web search, improving trustworthiness and accuracy in high-stakes domains.
Key contributions
- AuthGR is the first framework to integrate document authority into generative information retrieval.
- Employs multimodal authority scoring using a vision-language model for trustworthiness.
- Utilizes a three-stage training pipeline to progressively instill authority awareness.
- Achieves significant improvements in user engagement and reliability in online A/B tests.
Why it matters
This paper addresses a critical gap in generative retrieval by integrating document authority, ensuring trustworthiness alongside relevance. It's crucial for high-stakes domains like healthcare and finance. The real-world online deployment and positive user engagement demonstrate its practical impact.
Original Abstract
Generative information retrieval (GenIR) formulates the retrieval process as a text-to-text generation task, leveraging the vast knowledge of large language models. However, existing works primarily optimize for relevance while often overlooking document trustworthiness. This is critical in high-stakes domains like healthcare and finance, where relying solely on semantic relevance risks retrieving unreliable information. To address this, we propose an Authority-aware Generative Retriever (AuthGR), the first framework that incorporates authority into GenIR. AuthGR consists of three key components: (i) Multimodal Authority Scoring, which employs a vision-language model to quantify authority from textual and visual cues; (ii) a Three-stage Training Pipeline to progressively instill authority awareness into the retriever; and (iii) a Hybrid Ensemble Pipeline for robust deployment. Offline evaluations demonstrate that AuthGR successfully enhances both authority and accuracy, with our 3B model matching a 14B baseline. Crucially, large-scale online A/B tests and human evaluations conducted on the commercial web search platform confirm significant improvements in real-world user engagement and reliability.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.