ArXiv TLDR

Yueting Zhuang

6 papers ยท Latest:

Computer Vision

SpatialEvo: Self-Evolving Spatial Intelligence via Deterministic Geometric Environments

SpatialEvo uses deterministic geometric environments to enable self-evolving 3D spatial reasoning, outperforming existing methods by generating precise, physically valid training data.

2604.14144
Computer Vision

UI-Zoomer: Uncertainty-Driven Adaptive Zoom-In for GUI Grounding

UI-Zoomer adaptively zooms into GUI elements based on prediction uncertainty, improving localization for small icons and dense layouts without retraining.

2604.14113
Computer Vision

LMMs Meet Object-Centric Vision: Understanding, Segmentation, Editing and Generation

LMMs struggle with object-level tasks; this paper reviews how object-centric vision enhances LMMs for precise understanding, segmentation, editing, and generation.

2604.11789
Machine Learning

ClawGUI: A Unified Framework for Training, Evaluating, and Deploying GUI Agents

ClawGUI is an open-source framework that unifies training, evaluation, and deployment for GUI agents, addressing key infrastructure bottlenecks.

2604.11784
Computer Vision

Seeing but Not Thinking: Routing Distraction in Multimodal Mixture-of-Experts

This paper identifies "Seeing but Not Thinking" in multimodal MoE models, where visual inputs cause routing distraction, and proposes an intervention.

2604.08541
Machine Learning

SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

SKILL0 is an in-context RL framework that internalizes agent skills into LLM parameters, enabling zero-shot autonomous behavior.

2604.02268

๐Ÿ“ฌ Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week โ€” summarized, scored, and delivered to your inbox every Monday.