Ruqi Zhang

3 papers · Latest: May 1, 2026

Uniform-Correct Policy Optimization: Breaking RLVR's Indifference to Diversity

UCPO improves diversity in RLVR by penalizing non-uniform distributions over correct solutions, boosting Pass@K while maintaining Pass@1.

2605.00365May 1, 2026

Machine Learning

Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control

Entrocraft, a new rejection-sampling method, precisely controls entropy in LLM RL, preventing performance saturation and significantly boosting training gains.

2604.26326Apr 29, 2026

Machine Learning

Slithering Through Gaps: Capturing Discrete Isolated Modes via Logistic Bridging

HiSS is a novel Gibbs sampler that uses logistic bridging to efficiently capture discrete isolated modes in complex, high-dimensional distributions.

2604.10821Apr 12, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.