PAFT: Preservation Aware Fine-Tuning for Minimal-Edit Program Repair
Boyang Yang, Zijian Cai, Shunfu Jin, Haoye Tain
TLDR
PAFT is a new fine-tuning method for LLMs that significantly improves program repair by generating minimal, localized code edits, reducing over-editing.
Key contributions
- Introduces PAFT, a preservation-aware fine-tuning method for LLMs to generate minimal-edit program repairs.
- Derives token-level preservation signals from code alignment, combined with full-sequence masking and curriculum learning.
- Achieves up to 65.6% higher pass@1 and 32.6% lower average edit distance over standard fine-tuning.
- Outperforms strong baselines like AdaPatcher, yielding smaller, more localized patches without inference overhead.
Why it matters
LLMs often over-edit code during program repair, increasing review and maintenance costs. PAFT addresses this by explicitly training models to preserve stable code context. This leads to more precise, localized, and plausible patches, making automated repair more practical and cost-effective for developers.
Original Abstract
Large language models (LLMs) are effective for automated program repair, but plausible patches that pass the full test suite often rewrite more code than necessary, increasing review and maintenance costs. This over-editing is common because most bugs are localized, while standard supervised fine-tuning provides no explicit signal about which tokens should be preserved and which should be changed. We propose PAFT, a preservation-aware fine-tuning method for minimal-edit program repair. PAFT derives token-level preservation signals by aligning buggy and fixed code, combines them with full-sequence masking, and applies an edit-difficulty curriculum. Across Defects4J and HumanEval-Java, PAFT improves pass@1 by up to 65.6% over standard supervised fine-tuning (StdFT) while reducing average edit distance (AED) by up to 32.6%. On Defects4J with DeepSeek-Coder-6.7B, PAFT also outperforms AdaPatcher, a strong preference-based repair baseline, improving pass@1 from 5.9% to 10.1% while reducing median AED from 61.0 to 42.0. Overall, PAFT preserves stable context and concentrates edits on faulty regions, yielding smaller, more localized, plausible patches without inference-time search, reranking, or post-processing.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.