John Schulman
4 papers ยท Latest:
GPT-4 Technical Report
GPT-4 is a large-scale multimodal Transformer model achieving human-level performance on professional and academic benchmarks through advanced training and alignment techniques.
Training language models to follow instructions with human feedback
This paper presents InstructGPT, a method to align language models with user intent by fine-tuning GPT-3 using human feedback, resulting in more truthful, helpful, and less toxic outputs.
WebGPT: Browser-assisted question-answering with human feedback
WebGPT fine-tunes GPT-3 to answer complex questions by browsing the web and using human feedback to improve factual accuracy and answer quality.
Proximal Policy Optimization Algorithms
Proximal Policy Optimization (PPO) introduces a simpler, more efficient policy gradient method that improves sample complexity and performance across various reinforcement learning tasks.
๐ฌ Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week โ summarized, scored, and delivered to your inbox every Monday.