John Schulman

4 papers · Latest: March 15, 2023

GPT-4 Technical Report

GPT-4 is a large-scale multimodal Transformer model achieving human-level performance on professional and academic benchmarks through advanced training and alignment techniques.

2303.08774Mar 15, 2023

Natural Language Processing

Training language models to follow instructions with human feedback

This paper presents InstructGPT, a method to align language models with user intent by fine-tuning GPT-3 using human feedback, resulting in more truthful, helpful, and less toxic outputs.

2203.02155Mar 4, 2022

Natural Language Processing

WebGPT: Browser-assisted question-answering with human feedback

WebGPT fine-tunes GPT-3 to answer complex questions by browsing the web and using human feedback to improve factual accuracy and answer quality.

2112.09332Dec 17, 2021

Machine Learning

Proximal Policy Optimization Algorithms

Proximal Policy Optimization (PPO) introduces a simpler, more efficient policy gradient method that improves sample complexity and performance across various reinforcement learning tasks.

1707.06347Jul 20, 2017

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.