CORA: Conformal Risk-Controlled Agents for Safeguarded Mobile GUI Automation
Yushi Feng, Junye Du, Qifan Wang, Zizhan Ma, Qian Niu + 3 more
TLDR
CORA is a risk-controlled GUI agent providing statistical guarantees against harmful actions in autonomous mobile GUI automation.
Key contributions
- Introduces CORA, a post-policy, pre-action framework for GUI agents with statistical safety guarantees.
- Uses Conformal Risk Control to calibrate action execution based on a user-specified risk budget.
- Employs Guardian and Diagnostician models for risk estimation and intervention recommendations.
- Introduces Phone-Harm, a new benchmark for mobile safety violations with step-level harm labels.
Why it matters
Autonomous GUI agents risk severe user harm, and existing safeguards lack formal guarantees. CORA provides a practical, statistically grounded safety paradigm, improving the safety-helpfulness-interruption trade-off. This is crucial for reliable and trustworthy VLM-powered autonomous GUI execution.
Original Abstract
Graphical user interface (GUI) agents powered by vision language models (VLMs) are rapidly moving from passive assistance to autonomous operation. However, this unrestricted action space exposes users to severe and irreversible financial, privacy or social harm. Existing safeguards rely on prompt engineering, brittle heuristics and VLM-as-critic lack formal verification and user-tunable guarantees. We propose CORA (COnformal Risk-controlled GUI Agent), a post-policy, pre-action safeguarding framework that provides statistical guarantees on harmful executed actions. CORA reformulates safety as selective action execution: we train a Guardian model to estimate action-conditional risk for each proposed step. Rather than thresholding raw scores, we leverage Conformal Risk Control to calibrate an execute/abstain boundary that satisfies a user-specified risk budget and route rejected actions to a trainable Diagnostician model, which performs multimodal reasoning over rejected actions to recommend interventions (e.g., confirm, reflect, or abort) to minimize user burden. A Goal-Lock mechanism anchors assessment to a clarified, frozen user intent to resist visual injection attacks. To rigorously evaluate this paradigm, we introduce Phone-Harm, a new benchmark of mobile safety violations with step-level harm labels under real-world settings. Experiments on Phone-Harm and public benchmarks against diverse baselines validate that CORA improves the safety--helpfulness--interruption Pareto frontier, offering a practical, statistically grounded safety paradigm for autonomous GUI execution. Code and benchmark are available at cora-agent.github.io.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.