ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

April 13, 20262604.11784

Fei Tang, Zhiqiong Lu, Boxuan Zhang, Weiming Lu, Jun Xiao + 2 more

cs.LGcs.AIcs.CLcs.CV

TLDR

ClawGUI is an open-source framework that unifies training, evaluation, and deployment for GUI agents, addressing key infrastructure bottlenecks.

Key contributions

ClawGUI-RL provides open-source RL infrastructure for GUI agents, supporting parallel virtual/physical environments.
ClawGUI-Eval offers a standardized evaluation pipeline across 6 benchmarks and 11+ models with high reproduction.
ClawGUI-Agent deploys trained agents to Android, HarmonyOS, and iOS via 12+ chat platforms with hybrid control.
ClawGUI-2B achieves 17.1% Success Rate on MobileWorld GUI-Only, outperforming MAI-UI-2B by 6.0%.

Why it matters

This paper matters because it provides a much-needed unified, open-source infrastructure for GUI agents, tackling critical bottlenecks in training, evaluation, and deployment. By standardizing processes and enabling real-world application, ClawGUI accelerates progress in developing agents that interact with diverse software.

Original Abstract

GUI agents drive applications through their visual interfaces instead of programmatic APIs, interacting with arbitrary software via taps, swipes, and keystrokes, reaching a long tail of applications that CLI-based agents cannot. Yet progress in this area is bottlenecked less by modeling capacity than by the absence of a coherent full-stack infrastructure: online RL training suffers from environment instability and closed pipelines, evaluation protocols drift silently across works, and trained agents rarely reach real users on real devices. We present \textbf{ClawGUI}, an open-source framework addressing these three gaps within a single harness. \textbf{ClawGUI-RL} provides the first open-source GUI agent RL infrastructure with validated support for both parallel virtual environments and real physical devices, integrating GiGPO with a Process Reward Model for dense step-level supervision. \textbf{ClawGUI-Eval} enforces a fully standardized evaluation pipeline across 6 benchmarks and 11+ models, achieving 95.8\% reproduction against official baselines. \textbf{ClawGUI-Agent} brings trained agents to Android, HarmonyOS, and iOS through 12+ chat platforms with hybrid CLI-GUI control and persistent personalized memory. Trained end to end within this pipeline, \textbf{ClawGUI-2B} achieves 17.1\% Success Rate on MobileWorld GUI-Only, outperforming the same-scale MAI-UI-2B baseline by 6.0\%.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers