From Gaze to Guidance: Interpreting and Adapting to Users' Cognitive Needs with Multimodal Gaze-Aware AI Assistants

April 9, 20262604.08062

Valdemar Danry, Javier Hernandez, Andrew Wilson, Pattie Maes, Judith Amores

cs.HCcs.AI

TLDR

This paper introduces a gaze-aware AI assistant that interprets user cognitive needs via egocentric video, enhancing recall and interaction efficiency.

Key contributions

Developed a gaze-grounded multimodal LLM assistant using egocentric video and gaze overlays.
Identifies user difficulty points to provide targeted retrospective assistance.
Study shows improved accuracy, personalization, and information recall compared to text-only LLMs.
Users experienced more efficient interactions, speaking significantly fewer words.

Why it matters

This paper matters because it demonstrates how AI assistants can move beyond simple Q&A by interpreting users' real-time cognitive states through gaze. This approach paves the way for more adaptive, personalized, and effective AI-human interactions, particularly in learning and assistance scenarios.

Original Abstract

Current LLM assistants are powerful at answering questions, but they have limited access to the behavioral context that reveals when and where a user is struggling. We present a gaze-grounded multimodal LLM assistant that uses egocentric video with gaze overlays to identify likely points of difficulty and target follow-up retrospective assistance. We instantiate this vision in a controlled study (n=36) comparing the gaze-aware AI assistant to a text-only LLM assistant. Compared to a conventional LLM assistant, the gaze-aware assistant was rated as significantly more accurate and personalized in its assessments of users' reading behavior and significantly improved people's ability to recall information. Users spoke significantly fewer words with the gaze-aware assistant, indicating more efficient interactions. Qualitative results underscored both perceived benefits in comprehension and challenges when interpretations of gaze behaviors were inaccurate. Our findings suggest that gaze-aware LLM assistants can reason about cognitive needs to improve cognitive outcomes of users.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers