Xunliang Cai
5 papers ยท Latest:
Multi-Objective and Mixed-Reward Reinforcement Learning via Reward-Decorrelated Policy Optimization
RDPO improves multi-objective and mixed-reward RL by decorrelating rewards and stabilizing advantage allocation for diverse reward types.
SWE-Cycle: Benchmarking Code Agents across the Complete Issue Resolution Cycle
SWE-Cycle introduces a new benchmark and SWE-Judge evaluation system to accurately assess autonomous code agents across the complete software issue resolution cycle.
General365: Benchmarking General Reasoning in Large Language Models Across Diverse and Challenging Tasks
General365 is a new benchmark assessing LLMs' general reasoning, revealing their domain-dependent abilities and significant room for improvement beyond specialized tasks.
LARY: A Latent Action Representation Yielding Benchmark for Generalizable Vision-to-Action Alignment
LARY introduces a benchmark and dataset for evaluating latent action representations, showing general visual models excel and latent spaces align better with physical actions.
SKILL0: In-Context Agentic Reinforcement Learning for Skill Internalization
SKILL0 is an in-context RL framework that internalizes agent skills into LLM parameters, enabling zero-shot autonomous behavior.
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.