Zhipeng Wang
3 papers ยท Latest:
Machine Learning
Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training
A new principle for LM post-training uses sparse rewards for strong teachers and dense distillation for students, outperforming direct sparse RL.
2605.12483
Cryptography & SecurityFive Attacks on x402 Agentic Payment Protocol
This paper identifies five practical attacks on the x402 agentic payment protocol, revealing critical vulnerabilities in its design and implementation.
2605.11781
Machine LearningTIP: Token Importance in On-Policy Distillation
TIP introduces a two-axis taxonomy for token importance in on-policy distillation, significantly improving efficiency and reducing memory usage.
2604.14084
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.