Anamika Lochab

2 papers · Latest: May 1, 2026

Uniform-Correct Policy Optimization: Breaking RLVR's Indifference to Diversity

UCPO improves diversity in RLVR by penalizing non-uniform distributions over correct solutions, boosting Pass@K while maintaining Pass@1.

2605.00365May 1, 2026

Machine Learning

Addressing Performance Saturation for LLM RL via Precise Entropy Curve Control

Entrocraft, a new rejection-sampling method, precisely controls entropy in LLM RL, preventing performance saturation and significantly boosting training gains.

2604.26326Apr 29, 2026

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.