Tom Henighan
5 papers ยท Latest:
Discovering Language Model Behaviors with Model-Written Evaluations
This paper introduces a method to automatically generate high-quality evaluations using language models themselves, revealing new and unexpected behaviors as models scale.
Constitutional AI: Harmlessness from AI Feedback
Constitutional AI trains harmless AI assistants using AI-generated feedback guided by a set of human-defined principles, minimizing the need for human-labeled data.
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
This paper demonstrates that reinforcement learning from human feedback (RLHF) can effectively fine-tune language models to be both helpful and harmless, improving performance across NLP tasks while maintaining specialized skills.
Language Models are Few-Shot Learners
GPT-3, a 175 billion parameter language model, demonstrates strong few-shot learning abilities across diverse NLP tasks without task-specific fine-tuning.
Scaling Laws for Neural Language Models
This paper identifies power-law scaling relationships between language model performance and factors like model size, dataset size, and compute, enabling optimal training strategies under fixed compute budgets.
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.