Open-SAT: LLM-Guided Query Embedding Refinement for Open-Vocabulary Object Retrieval in Satellite Imagery

May 6, 20262605.05344

Md Adnan Arefeen, Biplob Debnath, Ravi K. Rajendran, Murugan Sankaradas, Srimat T. Chakradhar

cs.CVcs.AIcs.IR

TLDR

Open-SAT improves open-vocabulary satellite image retrieval by using LLMs to refine query embeddings at inference time, achieving significant F1 score gains.

Key contributions

Refines VLM-generated query embeddings using LLMs for better alignment with satellite imagery.
Operates as a training-free, inference-time algorithm, avoiding additional model training.
Leverages contextual information from LLMs about objects and their surroundings for enhanced retrieval.
Achieves up to 16.04% F1 score improvement on public benchmarks for open-vocabulary retrieval.

Why it matters

This paper addresses a critical challenge in satellite applications: retrieving specific objects from open-ended natural language queries. By leveraging LLMs for query refinement without retraining, Open-SAT offers a practical and effective solution. Its training-free nature makes it highly adaptable and immediately deployable for improving satellite image analysis.

Original Abstract

In satellite applications, user queries often take the form of open-ended natural language, extending beyond a fixed set of predefined categories. This open-vocabulary nature poses significant challenges for retrieving relevant image tiles, as the retrieval system must generalize to a wide range of unseen objects and concepts. While vision-language models (VLMs) such as CLIP are widely used for text-image retrieval, even fine-tuned variants often struggle to accurately align such queries with satellite imagery. To address this, we propose Open-SAT, a training-free query embedding refinement algorithm that operates at inference time to improve alignment between user queries and satellite image content. Open-SAT uses VLMs to compute embeddings for image tiles, which are stored in a vector database for efficient retrieval. At query time, it leverages Large Language Models (LLMs) to refine the text embeddings by incorporating contextual information about objects of interest and their surroundings. A threshold-free retrieval mechanism further enhances accuracy and efficiency. Experimental results in three public benchmarks demonstrate that Open-SAT improves the F1 score by up to 16.04%, while retrieving a comparable number of image tiles. These results demonstrate the effectiveness of Open-SAT in open-vocabulary satellite image retrieval, leveraging LLM guidance without the need for additional training or supervision.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers