From 'Here' to 'There': Exploring Proximity Semantics in Multimodal Data Exploration
Dennis Bromley, Diana Wang, Vidya Setlur
TLDR
A multimodal data exploration system integrates sketching, language, and annotations, leveraging "proximity semantics" to disambiguate user intent.
Key contributions
- Introduces a multimodal probe for time-series and geospatial data exploration.
- Combines free-form sketching, natural language, and visual annotations in a unified space.
- Employs a hybrid architecture using geometric sketch matching and visual language models (VLMs).
- Identifies "proximity semantics" where meaning is shaped by the closeness of multimodal elements.
Why it matters
Existing data exploration tools often miss subtle user intent. This paper introduces a multimodal system that leverages sketching, language, and annotations, revealing "proximity semantics." This concept provides a valuable framework for designing more intuitive and powerful data exploration interfaces.
Original Abstract
Modern data exploration tools often struggle to capture the subtleties of analytical intent, especially when users seek patterns that are difficult to specify using traditional query methods or natural language alone. We introduce a multimodal research probe for querying time-series and geospatial data that integrates free-form sketching, natural language, and visual annotations within a unified interaction space. Users articulate queries by sketching trends or spatial paths and augmenting them with annotations and analytical directives grounded in shared spatial and temporal context. The system employs a hybrid architecture combining geometric sketch matching and visual language models (VLMs) to support queries that interleave pattern matching and semantic constraints. Through a preliminary study with 20 participants, we observed recurring interaction patterns in which participants used spatial, temporal, and visual proximity to relate sketches, annotations, and language. Rather than treating these as isolated inputs, participants relied on their relative placement to disambiguate meaning. We analyze these behaviors as evidence for proximity semantics (PS), a form of deictic disambiguation in which meaning is shaped by the closeness of multimodal elements within a shared interaction space. We present PS as a conceptual lens grounded in observed user behavior, and discuss its implications for the design of future multimodal data exploration systems.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.