Fine-Tuning Small Reasoning Models for Quantum Field Theory

April 21, 20262604.18936

Nathaniel S. Woodward, Zhiqi Gao, Yurii Kvasiuk, Kendrick M. Smith, Frederic Sala + 1 more

cs.LGcs.AIhep-phhep-th

TLDR

This paper fine-tunes small 7B models for Quantum Field Theory using novel data generation, showing how physics reasoning develops.

Key contributions

Fine-tuned small (7B) reasoning models specifically for Quantum Field Theory (QFT).
Developed a robust data generation pipeline for 2,500+ synthetic and human-adapted QFT problems.
Conducted RL and SFT experiments, analyzing reasoning error evolution in model chains-of-thought.
Publicly released the data pipeline, verifiable QFT training data, and reasoning traces.

Why it matters

This work is the first to academically fine-tune small models for theoretical physics, specifically QFT. It provides crucial insights into how domain-specific reasoning develops in LLMs and offers valuable open-source resources for future research in physics AI.

Original Abstract

Despite the growing application of Large Language Models (LLMs) to theoretical physics, there is little academic exploration into how domain-specific physics reasoning ability develops while training these models. To investigate this, we perform the first academic fine-tuning study of small (7B-parameter) reasoning models dedicated specifically to theoretical physics. Because open-source verifiable training data required to train such capabilities is scarce, we developed a robust data generation pipeline that can both create synthetic problems and make existing human-authored problems suitable for model training. Selecting Quantum Field Theory (QFT) as our primary domain, we generated over 2,500 synthetic problems alongside a curated collection of human-adapted problems sourced from arXiv and standard pedagogical resources. We conduct both Reinforcement Learning (RL) and Supervised Fine-Tuning (SFT) experiments, benchmarking performance gains as well as generalization to other physics domains. We perform an extensive analysis of model chains-of-though before and after fine-tuning, to understand how reasoning errors evolve during RL and SFT. Finally, we publicly release our data pipeline, verifiable QFT training data, and $\sim$200M tokens of QFT reasoning traces.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers