Chengwei Qin

2 papers · Latest: April 30, 2026

PRISM: Pre-alignment via Black-box On-policy Distillation for Multimodal Reinforcement Learning

PRISM introduces a black-box on-policy distillation stage to align large multimodal models, mitigating distributional drift between SFT and RLVR for improved performance.

2604.28123Apr 30, 2026

Natural Language Processing

SafeReview: Defending LLM-based Review Systems Against Adversarial Hidden Prompts

SafeReview defends LLM-based peer review systems against adversarial hidden prompts using a co-evolving generator-defender framework.

2604.26506Apr 29, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.