Bing Liu

2 papers · Latest: May 12, 2026

Reward Hacking in Rubric-Based Reinforcement Learning

This paper investigates reward hacking in rubric-based RL, finding that even strong verifiers don't prevent issues if rubrics are flawed, leading to quality declines.

2605.12474May 12, 2026

Artificial Intelligence

The Llama 3 Herd of Models

Llama 3 is a new family of large multilingual foundation models excelling in language, coding, reasoning, and multimodal tasks, rivaling GPT-4 in quality and offering extensive public releases.

2407.21783Jul 31, 2024

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.