Ruqi Zhang
3 papers ยท Latest:
Machine Learning
Uniform-Correct Policy Optimization: Breaking RLVR's Indifference to Diversity
UCPO improves diversity in RLVR by penalizing non-uniform distributions over correct solutions, boosting Pass@K while maintaining Pass@1.
2605.00365
Machine LearningAddressing Performance Saturation for LLM RL via Precise Entropy Curve Control
Entrocraft, a new rejection-sampling method, precisely controls entropy in LLM RL, preventing performance saturation and significantly boosting training gains.
2604.26326
Machine LearningSlithering Through Gaps: Capturing Discrete Isolated Modes via Logistic Bridging
HiSS is a novel Gibbs sampler that uses logistic bridging to efficiently capture discrete isolated modes in complex, high-dimensional distributions.
2604.10821
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.