Yujun Zhou

2 papers · Latest: April 20, 2026

Too Correct to Learn: Reinforcement Learning on Saturated Reasoning Data

This paper introduces CUTS and Mixed-CUTS to prevent mode collapse in RL for LLMs on saturated reasoning data, boosting generalization.

PolicyLLM introduces PolicyBench, a cross-system benchmark, and PolicyMoE, an MoE model, to evaluate and enhance LLM comprehension of public policy.

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.