Beyond Chat and Clicks: GUI Agents for In-Situ Assistance via Live Interface Transformation
Pan Hao, Rishi Selvakumaran, Jacob Sun, Qianwen Wang
TLDR
This paper introduces DOMSteer, a GUI agent that provides in-situ assistance by transforming live web interfaces through browser-level DOM manipulations.
Key contributions
- Introduces "in-situ assistance" for live web interfaces via browser-level DOM interventions.
- Proposes a design space and computational pipeline for DOM-mediated GUI agents.
- Presents DOMSteer, a Chrome extension transforming live pages with contextual tooltips and layout changes.
- Evaluates DOMSteer, showing reliable, efficient assistance and improved usability over baselines.
Why it matters
This paper addresses the steep learning curve of complex visual interfaces by enabling GUI agents to actively reconfigure live web pages. It moves beyond chat-based help or costly app-specific engineering, offering a novel, direct way to assist users in real-time.
Original Abstract
Complex visual interfaces are powerful yet have a steep learning curve, as users must navigate feature-rich visual interfaces while reasoning about domain-specific operations. Existing approaches either deliver assistance through a separate chat-based interaction, or require substantial application-specific engineering to build support natively into each interface. To address the gaps, we propose in-situ assistance: a mode of support delivered directly within any live web interface through lightweight, browser-level interventions on the Document Object Model (DOM), without rebuilding the application or modifying its underlying logic. We contribute a design space and a computational pipeline for DOM-mediated in-situ assistance, characterizing how GUI agents can insert, mutate, or recompose web elements to make the interface easier for users to understand, use, and navigate. We instantiate in-situ assistance in DOMSteer, a Chrome extension that interprets a user's help request and live interface context, grounds it to relevant UI elements, and executes reversible DOM manipulations directly on the live page to deliver assistance, including contextual tooltips, control highlighting, layout reorganization. Quantitative evaluations on two complex visual interfaces show that DOMSteer delivers reliable and efficient in-situ assistance. Use cases and a comparative user study with baseline ChatGPTAtlas demonstrate the usability and effectiveness of DOMSteer. Altogether, these findings point to a broader role for GUI agents: not just assisting from the sidelines, but actively reconfiguring live interfaces to support users in the moment.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.