ArXiv TLDR

Geometric Analysis of Self-Supervised Vision Representations for Semantic Image Retrieval

🐦 Tweet
2604.24469

Esteban Rodríguez-Betancourt, Edgar Casasola-Murillo

cs.IRcs.CV

TLDR

This paper analyzes how the geometric properties of self-supervised vision representations impact approximate nearest neighbor search for semantic image retrieval.

Key contributions

  • Evaluates modern self-supervised vision representations for content-based image retrieval.
  • Reveals latent space geometry significantly impacts Approximate Nearest Neighbor (ANN) indexing.
  • Anisotropic representations with high skewness degrade partition/hashing-based ANN search.
  • Isotropic representations with local purity improve semantic retrieval performance.

Why it matters

This paper fills a gap by evaluating modern self-supervised learning methods for vision in content-based image retrieval. It reveals that the geometric properties of these representations critically impact vector search performance, guiding better design of SSL methods and retrieval systems.

Original Abstract

Content-based image retrieval (CBIR) systems enable users to search images based on visual content instead of relying on metadata. The text domain has benefited from vector search of representations created with unsupervised methods such as BERT. However, modern self-supervised learning methods for vision are mostly not reported in CBIR-related literature, instead relying on supervised models or multi-modal methods that align text and vision. We evaluate how the representations learned by modern self-supervised learning methods for vision perform under typical retrieval stacks that leverage vector databases and nearest neighbor search. Our evaluation reveals that the latent space geometry impacts approximate nearest neighbor (ANN) indexing. Specifically, highly anisotropic representations with high skewness produced by several modern SSL methods degrade the performance of partition-based and hashing-based search, even if their own linear probe or K-NN accuracy is not affected. In contrast, representations with higher isotropy and local purity better satisfy the distance-based assumptions of ANN indexes, leading to improved semantic retrieval performance.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.