Yujun Zhou
2 papers ยท Latest:
Machine Learning
Too Correct to Learn: Reinforcement Learning on Saturated Reasoning Data
This paper introduces CUTS and Mixed-CUTS to prevent mode collapse in RL for LLMs on saturated reasoning data, boosting generalization.
2604.18493
Natural Language ProcessingPolicyLLM: Towards Excellent Comprehension of Public Policy for Large Language Models
PolicyLLM introduces PolicyBench, a cross-system benchmark, and PolicyMoE, an MoE model, to evaluate and enhance LLM comprehension of public policy.
2604.12995
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.