Make Any Collection Navigable: Methods for Constructing and Evaluating Hypergraph of Text
Dean E. Alvarez, ChengXiang Zhai
TLDR
This paper introduces methods for constructing a Hypergraph of Text (HoT) to make any document collection navigable, along with a novel evaluation metric.
Key contributions
- Presents methods for constructing a Hypergraph of Text (HoT) for document navigation.
- Introduces "effort ratio," a novel quantitative metric for evaluating HoT structural quality.
- Demonstrates that simple TF-IDF methods match LLM-based methods on the new "effort ratio."
Why it matters
This paper proposes HoT construction methods and a novel "effort ratio" metric to make collections navigable. Simple TF-IDF baselines matching LLMs is key for efficient, practical information discovery systems.
Original Abstract
One reason the Web is more useful than a simple collection of documents is that the structure created by hyperlinks enables flexible navigation from one web page to another. However, hyperlinks are typically created manually and cannot fully capture a corpus' implicit semantic structures. Is there a general way to make an arbitrary collection navigable? Recent work has formalized this problem generally as constructing a Hypergraph of Text (HoT), which provides a formal mathematical structure for supporting navigation and browsing. However, how to construct and evaluate a Hypergraph of Text remains a challenge. In this paper, we propose and study several methods for constructing a HoT. We also propose a novel quantitative metric, effort ratio, for evaluating the structural quality of a constructed HoT. Experimental results show that even simple TF-IDF baselines can match LLM-based methods on our proposed effort ratio metric.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.