Dario Amodei

7 papers · Latest: December 19, 2022

Discovering Language Model Behaviors with Model-Written Evaluations

This paper introduces a method to automatically generate high-quality evaluations using language models themselves, revealing new and unexpected behaviors as models scale.

2212.09251Dec 19, 2022

Natural Language Processing

Constitutional AI: Harmlessness from AI Feedback

Constitutional AI trains harmless AI assistants using AI-generated feedback guided by a set of human-defined principles, minimizing the need for human-labeled data.

2212.08073Dec 15, 2022

Natural Language Processing

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

This paper demonstrates that reinforcement learning from human feedback (RLHF) can effectively fine-tune language models to be both helpful and harmless, improving performance across NLP tasks while maintaining specialized skills.

2204.05862Apr 12, 2022

Machine Learning

Evaluating Large Language Models Trained on Code

Codex, a GPT model fine-tuned on GitHub code, significantly outperforms prior models in generating correct Python programs from docstrings, demonstrating strong code synthesis capabilities.

2107.03374Jul 7, 2021

Natural Language Processing

Language Models are Few-Shot Learners

GPT-3, a 175 billion parameter language model, demonstrates strong few-shot learning abilities across diverse NLP tasks without task-specific fine-tuning.

2005.14165May 28, 2020

Machine Learning

Scaling Laws for Neural Language Models

This paper identifies power-law scaling relationships between language model performance and factors like model size, dataset size, and compute, enabling optimal training strategies under fixed compute budgets.

2001.08361Jan 23, 2020

Statistical Machine Learning

Deep reinforcement learning from human preferences

This paper demonstrates that deep reinforcement learning agents can be effectively trained using human preferences as feedback instead of explicit reward functions, enabling complex task learning with minimal human input.

1706.03741Jun 12, 2017

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.