BAMI: Training-Free Bias Mitigation in GUI Grounding
Borui Zhang, Bo Zhang, Bo Wang, Wenzhao Zheng, Yuhao Cheng + 4 more
TLDR
BAMI is a training-free method that uses coarse-to-fine focus and candidate selection to mitigate precision and ambiguity biases in GUI grounding models.
Key contributions
- Identifies precision and ambiguity biases in GUI grounding using Masked Prediction Distribution (MPD).
- Introduces BAMI, a training-free method to mitigate biases via coarse-to-fine focus and candidate selection.
- Significantly improves GUI grounding model accuracy (e.g., TianXi-Action-7B on ScreenSpot-Pro from 51.9% to 57.8%).
- Demonstrates robustness and effectiveness across diverse parameter configurations.
Why it matters
Existing GUI grounding models struggle with complex interfaces due to specific biases. BAMI offers a novel, training-free approach to address these issues. Its significant performance gains on benchmarks like ScreenSpot-Pro make it a valuable advancement for developing more reliable GUI agents.
Original Abstract
GUI grounding is a critical capability for enabling GUI agents to execute tasks such as clicking and dragging. However, in complex scenarios like the ScreenSpot-Pro benchmark, existing models often suffer from suboptimal performance. Utilizing the proposed \textbf{Masked Prediction Distribution (MPD)} attribution method, we identify that the primary sources of errors are twofold: high image resolution (leading to precision bias) and intricate interface elements (resulting in ambiguity bias). To address these challenges, we introduce \textbf{Bias-Aware Manipulation Inference (BAMI)}, which incorporates two key manipulations, coarse-to-fine focus and candidate selection, to effectively mitigate these biases. Our extensive experimental results demonstrate that BAMI significantly enhances the accuracy of various GUI grounding models in a training-free setting. For instance, applying our method to the TianXi-Action-7B model boosts its accuracy on the ScreenSpot-Pro benchmark from 51.9\% to 57.8\%. Furthermore, ablation studies confirm the robustness of the BAMI approach across diverse parameter configurations, highlighting its stability and effectiveness. Code is available at https://github.com/Neur-IO/BAMI.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.