Improving Random Testing via LLM-powered UI Tarpit Escaping for Mobile Apps

April 8, 20262604.06763

Mengqian Xu, Yiheng Xiong, Le Chang, Ting Su, Chengcheng Wan + 1 more

cs.SE

TLDR

This paper introduces LLM-powered random GUI testing to escape UI tarpits in mobile apps, significantly improving test coverage and bug detection.

Key contributions

Introduces LLM-powered random GUI testing to mitigate UI tarpits in mobile app exploration.
Monitors UI similarity to identify tarpits and queries LLMs for promising escape events.
Implemented on HybridMonkey and HybridDroidbot, outperforming baselines in coverage and bug detection.
Achieved average coverage improvements of 54.8% and 44.8%, finding 75 unique bugs (34 new).

Why it matters

Random GUI testing often gets trapped in UI tarpits, limiting its effectiveness. This paper offers a novel LLM-driven solution that significantly boosts test coverage and bug discovery, making mobile app testing more robust and efficient.

Original Abstract

Random GUI testing is a widely-used technique for testing mobile apps. However, its effectiveness is limited by the notorious issue -- UI exploration tarpits, where the exploration is trapped in local UI regions, thus impeding test coverage and bug discovery. In this experience paper, we introduce LLM-powered random GUI Testing, a novel hybrid testing approach to mitigating UI tarpits during random testing. Our approach monitors UI similarity to identify tarpits and query LLMs to suggest promising events for escaping the encountered tarpits. We implement our approach on top of two different automated input generation (AIG) tools for mobile apps: (1) HybridMonkey upon Monkey, a state-of-the-practice tool; and (2) HybridDroidbot upon Droidbot, a state-of-the-art tool. We evaluated them on 12 popular, real-world apps. The results show that HybridMonkey and HybridDroidbot outperform all baselines, achieving average coverage improvements of 54.8% and 44.8%, respectively, and detecting the highest number of unique crashes. In total, we found 75 unique bugs, including 34 previously unknown bugs. To date, 26 bugs have been confirmed and fixed. We also applied HybridMonkey on WeChat, a popular industrial app with billions of monthly active users. HybridMonkey achieved higher activity coverage and found more bugs than random testing.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers