Semantic Recall for Vector Search
Leonardo Kuffo, Ioanna Tsakalidou, Roberta De Viti, Albert Angel, Jiří Iša + 1 more
TLDR
Introduces Semantic Recall, a new metric for ANN search that focuses on semantically relevant objects, improving quality assessment.
Key contributions
- Introduces Semantic Recall, a novel metric for evaluating approximate nearest neighbor (ANN) search.
- Unlike traditional recall, it only considers semantically relevant objects, avoiding penalties for irrelevant neighbors.
- Especially useful for queries with few relevant results, a common scenario in embedding datasets.
- Proposes Tolerant Recall as a proxy and demonstrates improved cost-quality tradeoffs when optimized.
Why it matters
This paper addresses a critical flaw in traditional ANN search evaluation by introducing Semantic Recall. By focusing on true semantic relevance, it provides a more accurate assessment of retrieval quality, especially for challenging queries. This can lead to more effective and efficient vector search systems.
Original Abstract
We introduce Semantic Recall, a novel metric to assess the quality of approximate nearest neighbor search algorithms by considering only semantically relevant objects that are theoretically retrievable via exact nearest neighbor search. Unlike traditional recall, semantic recall does not penalize algorithms for failing to retrieve objects that are semantically irrelevant to the query, even if those objects are among their nearest neighbors. We demonstrate that semantic recall is particularly useful for assessing retrieval quality on queries that have few relevant results among their nearest neighbors-a scenario we uncover to be common within embedding datasets. Additionally, we introduce Tolerant Recall, a proxy metric that approximates semantic recall when semantically relevant objects cannot be identified. We empirically show that our metrics are more effective indicators of retrieval quality, and that optimizing search algorithms for these metrics can lead to improved cost-quality tradeoffs.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.