ArXiv TLDR

Weiming Lu

6 papers ยท Latest:

Natural Language Processing

Pause or Fabricate? Training Language Models for Grounded Reasoning

GRIL is a new RL framework that trains LLMs to detect incomplete information, pause, and clarify, reducing fabrication and improving grounded reasoning.

2604.19656
Computer Vision

SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments

SpatialEvo uses deterministic geometric environments to enable self-evolving 3D spatial reasoning, outperforming existing methods by generating precise, physically valid training data.

2604.14144
Computer Vision

UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding

UI-Zoomer adaptively zooms into GUI elements based on prediction uncertainty, improving localization for small icons and dense layouts without retraining.

2604.14113
Machine Learning

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

ClawGUI is an open-source framework that unifies training, evaluation, and deployment for GUI agents, addressing key infrastructure bottlenecks.

2604.11784
Computer Vision

Seeing but Not Thinking: Routing Distraction in Multimodal Mixture-of-Experts

This paper identifies "Seeing but Not Thinking" in multimodal MoE models, where visual inputs cause routing distraction, and proposes an intervention.

2604.08541
Machine Learning

SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

SKILL0 is an in-context RL framework that internalizes agent skills into LLM parameters, enabling zero-shot autonomous behavior.

2604.02268

๐Ÿ“ฌ Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week โ€” summarized, scored, and delivered to your inbox every Monday.