ArXiv TLDR

OT on the Map: Quantifying Domain Shifts in Geographic Space

🐦 Tweet
2604.16220

Haoran Zhang, Livia Betti, Konstantin Klemmer, Esther Rolf, David Alvarez-Melis

cs.LG

TLDR

GeoSpOT quantifies geographic domain shifts using Optimal Transport, predicting cross-domain transfer difficulty for geospatial models.

Key contributions

  • Proposes GeoSpOT, a method using Optimal Transport to compute distances between geospatial domains.
  • GeoSpOT distances effectively predict cross-domain transfer difficulty for geospatial models.
  • Demonstrates location encoder embeddings provide useful information for OOD performance.
  • GeoSpOT guides data selection and identifies regions where models may underperform.

Why it matters

Out-of-domain generalization is a major challenge in geographic machine learning. This paper introduces GeoSpOT, a principled method to quantify domain shifts, helping predict model transfer success. It enables better data selection and deployment decisions, even without task-specific data.

Original Abstract

In computer vision and machine learning for geographic data, out-of-domain generalization is a pervasive challenge, arising from uneven global data coverage and distribution shifts across geographic regions. Though models are frequently trained in one region and deployed in another, there is no principled method for determining when this cross-region adaptation will be successful. A well-defined notion of distance between distributions can effectively quantify how different a new target domain is compared to the domains used for model training, which in turn could support model training and deployment decisions. In this paper, we propose a strategy for computing distances between geospatial domains that leverages geographic information with Optimal Transport methods (GeoSpOT). In our experiments, GeoSpOT distances emerge as effective predictors of cross-domain transfer difficulty. We further demonstrate that embeddings from pretrained location encoders provide information comparable to image/text embeddings, despite relying solely on longitude-latitude pairs as input. This allows users to get an approximation of out-of-domain performance for geospatial models, even when the exact downstream task is unknown, or no task-specific data is available. Building on these findings, we show that GeoSpOT distances can preemptively guide data selection and enable predictive tools to analyze regions where a model is likely to underperform.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.