ArXiv TLDR

Modeling Subjective Urban Perception with Human Gaze

🐦 Tweet
2605.00764

Lin Che, Xi Wang, Marc Pollefeys, Konrad Schindler, Martin Raubal + 1 more

cs.CVcs.HC

TLDR

This paper introduces a new dataset and framework to model subjective urban perception by incorporating human gaze data with street view images.

Key contributions

  • Introduced Place Pulse-Gaze, a dataset augmenting street view images with eye-tracking and perception labels.
  • Proposed a Gaze-Guided Urban Perception Framework to study gaze's role in perception modeling.
  • Demonstrated gaze alone predicts urban perception and improves models when fused with scene representations.

Why it matters

This research highlights the critical role of human perceptual processes, specifically gaze, in understanding subjective urban environments. It offers a novel multimodal approach, paving the way for more human-centric urban computing applications.

Original Abstract

Urban perception describes how people subjectively evaluate urban environments, shaping how cities are experienced and understood. Existing computational approaches primarily model urban perception directly from street view images, but largely ignore the human perceptual process through which such judgments are formed. In this paper, we introduce Place Pulse-Gaze, an urban perception dataset that augments street view images with synchronized eye-tracking recordings and individual perception labels. Based on this dataset, we propose a Gaze-Guided Urban Perception Framework to study how gaze behavior contributes to the modeling of subjective urban perception. The framework systematically investigates three complementary settings: gaze-only modeling, gaze fusion with explicit semantic scene representations, and gaze fusion with implicit richer visual representations. Experiments show that gaze alone already carries useful predictive signals for subjective urban perception, and that integrating gaze with scene representations further improves prediction under both semantic and richer visual representations. Overall, our findings highlight the importance of incorporating human perceptual processes into urban scene understanding and open a direction for gaze-guided multimodal urban computing.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.