Intent Lenses: Inferring Capture-Time Intent to Transform Opportunistic Photo Captures into Structured Visual Notes

April 10, 20262604.09438

Ashwin Ram, Aeneas Leon Sommer, Martin Schmitz, Jürgen Steimle

cs.HC

TLDR

Intent Lenses use AI to infer user intent from opportunistic photos, transforming them into structured, interactive visual notes for better sensemaking.

Key contributions

Introduces "Intent Lenses," a new primitive for intent-mediated note generation from photos.
Lenses infer capture-time intent using LLMs to create reusable, interactive visual notes.
Instantiates a system for academic conference photos, generating structured notes on a spatial canvas.
User study showed intent-mediated notes align with expectations and facilitate sensemaking.

Why it matters

This paper addresses the challenge of transforming opportunistic photos into meaningful notes by inferring user intent. It moves beyond generic summaries, providing a novel approach to generate structured, interactive visual notes that align with user expectations and facilitate deeper sensemaking.

Original Abstract

Opportunistic photo capture (e.g., slides, exhibits, or artifacts) is a common strategy for preserving information encountered in information-rich environments for later revisitation. While fast and minimally disruptive, such photo collections rarely become meaningful notes. Existing automatic note-generation approaches provide some support but often produce generic summaries that fail to reflect what users intended to capture. We introduce Intent Lenses, a conceptual primitive for intent-mediated note generation and sensemaking. Intent Lenses reify users' capture-time intent inferred from captured information into reusable interactive objects that encode the function to perform, the information sources to focus on, and how results are represented at an appropriate level of detail. These lenses are dynamically generated using the reasoning capabilities of large language models. To investigate this concept, we instantiate Intent Lenses in the context of academic conference photos and present an interactive system that infers lenses from presentation captures to generate structured visual notes on a spatial canvas. Users can further add, link, and arrange lenses across captures to support exploration and sensemaking. A study with nine academics showed that intent-mediated notes aligned with users' expectations, providing effective overviews of their captures while facilitating deeper sensemaking.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers