Jiajun Wu
3 papers ยท Latest:
Computer Vision
OMIBench: Benchmarking Olympiad-Level Multi-Image Reasoning in Large Vision-Language Model
OMIBench is a new benchmark evaluating large vision-language models' multi-image reasoning at Olympiad level, revealing significant performance gaps.
2604.20806
Artificial IntelligenceUsing large language models for embodied planning introduces systematic safety risks
LLMs used for robotic planning show significant safety risks, with even high-performing models generating dangerous plans, highlighting a critical challenge.
2604.18463
InCoder-32B-Thinking: Industrial Code World Model for Thinking
InCoder-32B-Thinking generates expert reasoning traces for industrial code by combining error-driven chain-of-thought with a hardware-aware world model.
2604.03144
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.