Qi Dai

3 papers · Latest: May 12, 2026

Covering Human Action Space for Computer Use: Data Synthesis and Benchmark

This paper introduces CUActSpot, a new benchmark and data synthesis method to improve computer-use agents' reliability on complex, diverse interactions.

2605.12501May 12, 2026

Computer Vision

MM-WebAgent: A Hierarchical Multimodal Web Agent for Webpage Generation

MM-WebAgent is a hierarchical multimodal agent that generates coherent and visually consistent webpages by coordinating AIGC elements through planning and self-reflection.

2604.15309Apr 16, 2026

Computer Vision

AVGen-Bench: A Task-Driven Benchmark for Multi-Granular Evaluation of Text-to-Audio-Video Generation

AVGen-Bench introduces a new benchmark and multi-granular evaluation for Text-to-Audio-Video generation, revealing gaps in semantic reliability.

2604.08540Apr 9, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.