Xunliang Cai

5 papers · Latest: May 13, 2026

Multi-Objective and Mixed-Reward Reinforcement Learning via Reward-Decorrelated Policy Optimization

RDPO improves multi-objective and mixed-reward RL by decorrelating rewards and stabilizing advantage allocation for diverse reward types.

2605.13641May 13, 2026

Software Engineering

SWE-Cycle: Benchmarking Code Agents across the Complete Issue Resolution Cycle

SWE-Cycle introduces a new benchmark and SWE-Judge evaluation system to accurately assess autonomous code agents across the complete software issue resolution cycle.

2605.13139May 13, 2026

Natural Language Processing

General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks

General365 is a new benchmark assessing LLMs' general reasoning, revealing their domain-dependent abilities and significant room for improvement beyond specialized tasks.

2604.11778Apr 13, 2026

Computer Vision

LARY: A Latent Action Representation Yielding Benchmark for Generalizable Vision-to-Action Alignment

LARY introduces a benchmark and dataset for evaluating latent action representations, showing general visual models excel and latent spaces align better with physical actions.

2604.11689Apr 13, 2026

Machine Learning

SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization

SKILL0 is an in-context RL framework that internalizes agent skills into LLM parameters, enabling zero-shot autonomous behavior.

2604.02268Apr 2, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.