Chuanyu Qin
3 papers ยท Latest:
Machine Learning
Near-Future Policy Optimization
NPO and AutoNPO enhance Reinforcement Learning with Verifiable Rewards (RLVR) by leveraging near-future policy checkpoints for improved off-policy learning.
2604.20733
Computer VisionFind, Fix, Reason: Context Repair for Video Reasoning
Find, Fix, Reason (FFR) introduces a teacher-student model for video reasoning that repairs context by providing missing spatiotemporal evidence.
2604.16243
Machine LearningSelf-Distilled RLVR
RLSD combines RLVR with self-distillation to provide fine-grained updates and reliable directions, improving LLM training stability and convergence.
2604.03128
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.