Ziyuan Zhuang

2 papers · Latest: May 13, 2026

Prefix Teach, Suffix Fade: Local Teachability Collapse in Strong-to-Weak On-Policy Distillation

A new on-policy distillation method, "Prefix Teach, Suffix Fade," improves strong-to-weak model training by focusing supervision on locally teachable trajectory segments.

2605.13643May 13, 2026

Machine Learning

Multi-Objective and Mixed-Reward Reinforcement Learning via Reward-Decorrelated Policy Optimization

RDPO improves multi-objective and mixed-reward RL by decorrelating rewards and stabilizing advantage allocation for diverse reward types.

2605.13641May 13, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.