BAMI: Training-Free Bias Mitigation in GUI Grounding

May 7, 20262605.06664

Borui Zhang, Bo Zhang, Bo Wang, Wenzhao Zheng, Yuhao Cheng + 4 more

cs.CVcs.AI

TLDR

BAMI is a training-free method that uses coarse-to-fine focus and candidate selection to mitigate precision and ambiguity biases in GUI grounding models.

Key contributions

Identifies precision and ambiguity biases in GUI grounding using Masked Prediction Distribution (MPD).
Introduces BAMI, a training-free method to mitigate biases via coarse-to-fine focus and candidate selection.
Significantly improves GUI grounding model accuracy (e.g., TianXi-Action-7B on ScreenSpot-Pro from 51.9% to 57.8%).
Demonstrates robustness and effectiveness across diverse parameter configurations.

Why it matters

Existing GUI grounding models struggle with complex interfaces due to specific biases. BAMI offers a novel, training-free approach to address these issues. Its significant performance gains on benchmarks like ScreenSpot-Pro make it a valuable advancement for developing more reliable GUI agents.

Original Abstract

GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance. Utilizing the proposed \textbf{Masked Prediction Distribution (MPD)} attribution method, we identify that the primary sources of errors are twofold: high image resolution (leading to precision bias) and intricate interface elements (resulting in ambiguity bias). To address these challenges, we introduce \textbf{Bias-Aware Manipulation Inference (BAMI)}, which incorporates two key manipulations, coarse-to-fine focus and candidate selection, to effectively mitigate these biases. Our extensive experimental results demonstrate that BAMI significantly enhances the accuracy of various GUI grounding models in a training-free setting. For instance, applying our method to the TianXi-Action-7B model boosts its accuracy on the ScreenSpot-Pro benchmark from 51.9\% to 57.8\%. Furthermore, ablation studies confirm the robustness of the BAMI approach across diverse parameter configurations, highlighting its stability and effectiveness. Code is available at https://github.com/Neur-IO/BAMI.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers