Jing Shao

5 papers · Latest: May 12, 2026

ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents

ToolCUA enables Computer Use Agents to optimally orchestrate GUI actions and high-level tools using a staged training paradigm, achieving new SOTA.

2605.12481May 12, 2026

Natural Language Processing

WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation

WildClawBench introduces a new benchmark for evaluating long-horizon, real-world agents using native runtimes and real tools.

2605.10912May 11, 2026

Artificial Intelligence

Benchmarks for Trajectory Safety Evaluation and Diagnosis in OpenClaw and Codex: ATBench-Claw and ATBench-CodeX

Introduces ATBench-Claw and ATBench-CodeX, new benchmarks for evaluating and diagnosing safety in agent trajectories for OpenClaw and OpenAI Codex.

2604.14858Apr 16, 2026

Natural Language Processing

Valence-Arousal Subspace in LLMs: Circular Emotion Geometry and Multi-Behavioral Control

Researchers found a valence-arousal subspace in LLMs, enabling precise control over emotional output, refusal, and sycophancy across models.

2604.03147Apr 3, 2026

Machine Learning

Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization

MODPO is a novel, RL-free method for aligning language models to multiple human preferences simultaneously, achieving stable and efficient optimization across diverse objectives.

2310.03708Oct 5, 2023

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.