Xiaoyu Li
2 papers ยท Latest:
Natural Language Processing
General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks
General365 is a new benchmark assessing LLMs' general reasoning, revealing their domain-dependent abilities and significant room for improvement beyond specialized tasks.
2604.11778
Computer VisionLARY: A Latent Action Representation Yielding Benchmark for Generalizable Vision-to-Action Alignment
LARY introduces a benchmark and dataset for evaluating latent action representations, showing general visual models excel and latent spaces align better with physical actions.
2604.11689
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.