Jing Shao
5 papers ยท Latest:
ToolCUA: Towards Optimal GUI-Tool Path Orchestration for Computer Use Agents
ToolCUA enables Computer Use Agents to optimally orchestrate GUI actions and high-level tools using a staged training paradigm, achieving new SOTA.
WildClawBench: A Benchmark for Real-World, Long-Horizon Agent Evaluation
WildClawBench introduces a new benchmark for evaluating long-horizon, real-world agents using native runtimes and real tools.
Benchmarks for Trajectory Safety Evaluation and Diagnosis in OpenClaw and Codex: ATBench-Claw and ATBench-CodeX
Introduces ATBench-Claw and ATBench-CodeX, new benchmarks for evaluating and diagnosing safety in agent trajectories for OpenClaw and OpenAI Codex.
Valence-Arousal Subspace in LLMs: Circular Emotion Geometry and Multi-Behavioral Control
Researchers found a valence-arousal subspace in LLMs, enabling precise control over emotional output, refusal, and sycophancy across models.
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
MODPO is a novel, RL-free method for aligning language models to multiple human preferences simultaneously, achieving stable and efficient optimization across diverse objectives.
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.