Xiaoyu Li

2 papers · Latest: April 13, 2026

General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks

General365 is a new benchmark assessing LLMs' general reasoning, revealing their domain-dependent abilities and significant room for improvement beyond specialized tasks.

2604.11778Apr 13, 2026

Computer Vision

LARY: A Latent Action Representation Yielding Benchmark for Generalizable Vision-to-Action Alignment

LARY introduces a benchmark and dataset for evaluating latent action representations, showing general visual models excel and latent spaces align better with physical actions.

2604.11689Apr 13, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.