Amanda Askell

6 papers · Latest: December 19, 2022

Discovering Language Model Behaviors with Model-Written Evaluations

This paper introduces a method to automatically generate high-quality evaluations using language models themselves, revealing new and unexpected behaviors as models scale.

2212.09251Dec 19, 2022

Natural Language Processing

Constitutional AI: Harmlessness from AI Feedback

Constitutional AI trains harmless AI assistants using AI-generated feedback guided by a set of human-defined principles, minimizing the need for human-labeled data.

2212.08073Dec 15, 2022

Natural Language Processing

Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback

This paper demonstrates that reinforcement learning from human feedback (RLHF) can effectively fine-tune language models to be both helpful and harmless, improving performance across NLP tasks while maintaining specialized skills.

2204.05862Apr 12, 2022

Natural Language Processing

Training language models to follow instructions with human feedback

This paper presents InstructGPT, a method to align language models with user intent by fine-tuning GPT-3 using human feedback, resulting in more truthful, helpful, and less toxic outputs.

2203.02155Mar 4, 2022

Computer Vision

Learning Transferable Visual Models From Natural Language Supervision

This paper presents CLIP, a model that learns versatile visual representations by training on 400 million image-text pairs, enabling zero-shot transfer to diverse vision tasks without task-specific training.

2103.00020Feb 26, 2021

Natural Language Processing

Language Models are Few-Shot Learners

GPT-3, a 175 billion parameter language model, demonstrates strong few-shot learning abilities across diverse NLP tasks without task-specific fine-tuning.

2005.14165May 28, 2020

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.