ArXiv TLDR

Wayne Xin Zhao

6 papers ยท Latest:

Natural Language Processing

ClawGym: A Scalable Framework for Building Effective Claw Agents

ClawGym introduces a scalable framework for developing Claw-style agents, including a synthetic dataset, trained models, and an evaluation benchmark.

2604.26904
Computer Vision

Improving Vision-language Models with Perception-centric Process Reward Models

Perceval is a new process reward model that improves vision-language models by providing token-level supervision to identify and correct perceptual errors.

2604.24583
Natural Language Processing

ArbGraph: Conflict-Aware Evidence Arbitration for Reliable Long-Form Retrieval-Augmented Generation

ArbGraph improves long-form RAG reliability by pre-generating evidence arbitration, resolving factual conflicts before text generation.

2604.18362
Natural Language Processing

Toward Autonomous Long-Horizon Engineering for ML Research

AiScientist is a new system for autonomous long-horizon ML research engineering, using hierarchical orchestration and a File-as-Bus workspace for durable state continuity.

2604.13018
Computer Vision

Towards Long-horizon Agentic Multimodal Search

LMM-Searcher enables long-horizon multimodal search by offloading visual data to files, using UIDs, and achieving SOTA performance.

2604.12890

InCoder-32B-Thinking: Industrial Code World Model for Thinking

InCoder-32B-Thinking generates expert reasoning traces for industrial code by combining error-driven chain-of-thought with a hardware-aware world model.

2604.03144

๐Ÿ“ฌ Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week โ€” summarized, scored, and delivered to your inbox every Monday.