Jiajun Wu

3 papers · Latest: April 22, 2026

OMIBench: Benchmarking Olympiad-Level Multi-Image Reasoning in Large Vision-Language Model

OMIBench is a new benchmark evaluating large vision-language models' multi-image reasoning at Olympiad level, revealing significant performance gaps.

2604.20806Apr 22, 2026

Artificial Intelligence

Using large language models for embodied planning introduces systematic safety risks

LLMs used for robotic planning show significant safety risks, with even high-performing models generating dangerous plans, highlighting a critical challenge.

2604.18463Apr 20, 2026

InCoder-32B-Thinking: Industrial Code World Model for Thinking

InCoder-32B-Thinking generates expert reasoning traces for industrial code by combining error-driven chain-of-thought with a hardware-aware world model.

2604.03144Apr 3, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.