Amanda Askell
6 papers ยท Latest:
Discovering Language Model Behaviors with Model-Written Evaluations
This paper introduces a method to automatically generate high-quality evaluations using language models themselves, revealing new and unexpected behaviors as models scale.
Constitutional AI: Harmlessness from AI Feedback
Constitutional AI trains harmless AI assistants using AI-generated feedback guided by a set of human-defined principles, minimizing the need for human-labeled data.
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
This paper demonstrates that reinforcement learning from human feedback (RLHF) can effectively fine-tune language models to be both helpful and harmless, improving performance across NLP tasks while maintaining specialized skills.
Training language models to follow instructions with human feedback
This paper presents InstructGPT, a method to align language models with user intent by fine-tuning GPT-3 using human feedback, resulting in more truthful, helpful, and less toxic outputs.
Learning Transferable Visual Models From Natural Language Supervision
This paper presents CLIP, a model that learns versatile visual representations by training on 400 million image-text pairs, enabling zero-shot transfer to diverse vision tasks without task-specific training.
Language Models are Few-Shot Learners
GPT-3, a 175 billion parameter language model, demonstrates strong few-shot learning abilities across diverse NLP tasks without task-specific fine-tuning.
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.