Bing Liu
2 papers ยท Latest:
Artificial Intelligence
Reward Hacking in Rubric-Based Reinforcement Learning
This paper investigates reward hacking in rubric-based RL, finding that even strong verifiers don't prevent issues if rubrics are flawed, leading to quality declines.
2605.12474
Artificial IntelligenceThe Llama 3 Herd of Models
Llama 3 is a new family of large multilingual foundation models excelling in language, coding, reasoning, and multimodal tasks, rivaling GPT-4 in quality and offering extensive public releases.
2407.21783
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.