Dario Amodei
7 papers ยท Latest:
Discovering Language Model Behaviors with Model-Written Evaluations
This paper introduces a method to automatically generate high-quality evaluations using language models themselves, revealing new and unexpected behaviors as models scale.
Constitutional AI: Harmlessness from AI Feedback
Constitutional AI trains harmless AI assistants using AI-generated feedback guided by a set of human-defined principles, minimizing the need for human-labeled data.
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
This paper demonstrates that reinforcement learning from human feedback (RLHF) can effectively fine-tune language models to be both helpful and harmless, improving performance across NLP tasks while maintaining specialized skills.
Evaluating Large Language Models Trained on Code
Codex, a GPT model fine-tuned on GitHub code, significantly outperforms prior models in generating correct Python programs from docstrings, demonstrating strong code synthesis capabilities.
Language Models are Few-Shot Learners
GPT-3, a 175 billion parameter language model, demonstrates strong few-shot learning abilities across diverse NLP tasks without task-specific fine-tuning.
Scaling Laws for Neural Language Models
This paper identifies power-law scaling relationships between language model performance and factors like model size, dataset size, and compute, enabling optimal training strategies under fixed compute budgets.
Deep reinforcement learning from human preferences
This paper demonstrates that deep reinforcement learning agents can be effectively trained using human preferences as feedback instead of explicit reward functions, enabling complex task learning with minimal human input.
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.