Chaojun Xiao

3 papers · Latest: May 11, 2026

DECO: Sparse Mixture-of-Experts with Dense-Comparable Performance on End-Side Devices

DECO is a sparse MoE model matching dense performance on end-side devices, offering 3x speedup and reduced storage overhead.

2605.10933May 11, 2026

Computer Vision

PRISM: Pre-alignment via Black-box On-policy Distillation for Multimodal Reinforcement Learning

PRISM introduces a black-box on-policy distillation stage to align large multimodal models, mitigating distributional drift between SFT and RLVR for improved performance.

2604.28123Apr 30, 2026

Machine Learning

Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe

This paper investigates on-policy distillation (OPD) dynamics in LLMs, identifying success conditions, token-level mechanisms, and practical recovery strategies.

2604.13016Apr 14, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.