Zhipeng Wang

3 papers · Latest: May 12, 2026

Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training

A new principle for LM post-training uses sparse rewards for strong teachers and dense distillation for students, outperforming direct sparse RL.

2605.12483May 12, 2026

Cryptography & Security

Five Attacks on x402 Agentic Payment Protocol

This paper identifies five practical attacks on the x402 agentic payment protocol, revealing critical vulnerabilities in its design and implementation.

2605.11781May 12, 2026

Machine Learning

TIP: Token Importance in On-Policy Distillation

TIP introduces a two-axis taxonomy for token importance in on-policy distillation, significantly improving efficiency and reducing memory usage.

2604.14084Apr 15, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.