Learn where to Click from Yourself: On-Policy Self-Distillation for GUI Grounding

May 1, 20262605.00642

Yan Zhang, Daiqing Wu, Huawen Shen, Yu Zhou, Can Ma

cs.AIcs.CV

TLDR

GUI-SD introduces an on-policy self-distillation framework for GUI grounding, outperforming prior RL methods in accuracy and efficiency.

Key contributions

Introduces GUI-SD, the first on-policy self-distillation (OPSD) framework tailored for GUI grounding.
Constructs a visually enriched privileged context for the teacher using bounding boxes and soft masks.
Employs entropy-guided distillation to adaptively weight tokens based on digit significance and teacher confidence.
Consistently outperforms GRPO-based methods and naive OPSD in both accuracy and training efficiency.

Why it matters

GUI grounding is crucial for autonomous GUI agents, but current RL methods struggle with expensive rollouts and sparse signals. This paper addresses these limitations by applying on-policy self-distillation. GUI-SD offers a more efficient and accurate approach, significantly advancing the core capability of autonomous GUI agents.

Original Abstract

Graphical User Interface (GUI) grounding maps natural language instructions to the visual coordinates of target elements and serves as a core capability for autonomous GUI agents. Recent reinforcement learning methods (e.g., GRPO) have achieved strong performance, but they rely on expensive multiple rollouts and suffer from sparse signals on hard samples. These limitations make on-policy self-distillation (OPSD), which provides dense token-level supervision from a single rollout, a promising alternative. However, its applicability to GUI grounding remains unexplored. In this paper, we present GUI-SD, the first OPSD framework tailored for GUI grounding. First, it constructs a visually enriched privileged context for the teacher using a target bounding box and a Gaussian soft mask, providing informative guidance without leaking exact coordinates. Second, it employs entropy-guided distillation, which adaptively weights tokens based on digit significance and teacher confidence, concentrating optimization on the most impactful and reliable positions. Extensive experiments on six representative GUI grounding benchmarks show that GUI-SD consistently outperforms GRPO-based methods and naive OPSD in both accuracy and training efficiency. Code and training data are available at https://zhangyan-ucas.github.io/GUI-SD/.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers