Xiaomin Li

2 papers · Latest: May 13, 2026

AgentLens: Revealing The Lucky Pass Problem in SWE-Agent Evaluation

AgentLens reveals the 'Lucky Pass' problem in SWE-agent evaluation, introducing a process-level framework to assess trajectory quality beyond simple pass/fail.

2605.12925May 13, 2026

Computer Vision

Seek-and-Solve: Benchmarking MLLMs for Visual Clue-Driven Reasoning in Daily Scenarios

DailyClue is a new benchmark for MLLMs that evaluates their ability to perform visual clue-driven reasoning in complex, real-world daily scenarios.

2604.14041Apr 15, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.