Yuanda Xu
2 papers ยท Latest:
Machine Learning
Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training
A new principle for LM post-training uses sparse rewards for strong teachers and dense distillation for students, outperforming direct sparse RL.
2605.12483
Machine LearningTIP: Token Importance in On-Policy Distillation
TIP introduces a two-axis taxonomy for token importance in on-policy distillation, significantly improving efficiency and reducing memory usage.
2604.14084
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.