Wenhu Chen
5 papers ยท Latest:
Visual Generation in the New Era: An Evolution from Atomic Mapping to Agentic World Modeling
This paper proposes a new five-level taxonomy for visual generation, shifting from appearance synthesis to intelligent, agentic world modeling.
Tuna-2: Pixel Embeddings Beat Vision Encoders for Multimodal Understanding and Generation
Tuna-2 is a unified multimodal model using pixel embeddings for understanding and generation, outperforming vision encoders and simplifying architecture.
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time
RationalRewards uses explicit, multi-dimensional critiques to improve visual generation at both training and test time, outperforming scalar rewards.
ClawBench: Can AI Agents Complete Everyday Online Tasks?
ClawBench introduces a real-world benchmark of 153 online tasks across 144 live platforms, revealing current AI agents struggle with everyday web automation.
Explanations from Large Language Models Make Small Reasoners Better
This paper shows how explanations generated by large language models can be used to train smaller, more efficient models that achieve superior reasoning accuracy and generate high-quality explanations.
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.