ArXiv TLDR

Solving Physics Olympiad via Reinforcement Learning on Physics Simulators

🐦 Tweet
2604.11805

Mihir Prabhudesai, Aryan Satpathy, Yangmin Li, Zheyang Qin, Nikash Bhardwaj + 4 more

cs.LGcs.AIcs.CVcs.RO

TLDR

LLMs trained with reinforcement learning on physics simulators achieve strong zero-shot sim-to-real transfer, improving performance on Physics Olympiad problems.

Key contributions

  • Physics simulators generate scalable synthetic QA data for LLM training.
  • LLMs are trained with reinforcement learning on this novel synthetic data.
  • Demonstrates zero-shot sim-to-real transfer to real-world physics problems.
  • Boosts IPhO performance by 5-10% across models, solely from simulated data.

Why it matters

This paper addresses the critical bottleneck of limited QA data for training reasoning LLMs in physics. It demonstrates that physics simulators can serve as scalable data generators, enabling LLMs to acquire deep physical reasoning skills. This approach opens new avenues for training advanced reasoning models in scientific domains beyond internet data limitations.

Original Abstract

We have witnessed remarkable advances in LLM reasoning capabilities with the advent of DeepSeek-R1. However, much of this progress has been fueled by the abundance of internet question-answer (QA) pairs, a major bottleneck going forward, since such data is limited in scale and concentrated mainly in domains like mathematics. In contrast, other sciences such as physics lack large-scale QA datasets to effectively train reasoning-capable models. In this work, we show that physics simulators can serve as a powerful alternative source of supervision for training LLMs for physical reasoning. We generate random scenes in physics engines, create synthetic question-answer pairs from simulated interactions, and train LLMs using reinforcement learning on this synthetic data. Our models exhibit zero-shot sim-to-real transfer to real-world physics benchmarks: for example, training solely on synthetic simulated data improves performance on IPhO (International Physics Olympiad) problems by 5-10 percentage points across model sizes. These results demonstrate that physics simulators can act as scalable data generators, enabling LLMs to acquire deep physical reasoning skills beyond the limitations of internet-scale QA data. Code available at: https://sim2reason.github.io/.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.