ArXiv TLDR

A Universal Textual Merge Strategy Based on Tokens for Version Control Systems

🐦 Tweet
2604.13813

Qiqi Jason Gu, Mikoláš Janota

cs.SE

TLDR

Summer is a universal token-based merge algorithm for version control systems that reduces conflicts and improves merge accuracy across various document formats.

Key contributions

  • Introduces Summer, a novel token-based merge algorithm for version control systems.
  • Operates independently of document formats and programming languages.
  • Uses token-level string-rewriting and move rules to construct merges.
  • Achieved 36% verbatim accuracy on ConflictBench, outperforming five other tools.

Why it matters

Traditional line-based merges often create spurious conflicts, while syntax-aware methods have limitations. Summer offers a universal, format-independent solution, significantly improving merge accuracy. This reduces developer effort and enhances code quality in version control systems.

Original Abstract

Merging is a core operation in version control systems such as Git, but traditional line-based algorithms often yield spurious conflicts, particularly in the presence of refactorings or parallel edits. While syntax- and semantics-aware merging approaches can reduce conflicts, they introduce drawbacks such as loss of formatting, dependence on language-specific parsers, and limited flexibility across heterogeneous artifacts. To address this gap, we present Summer, a novel textual token-based merge algorithm independent of document formats. Dividing text into tokens, our approach formulates token-level changes in one branch into string-rewriting rules and move rules, and applies these rules to the text of the other branch to construct a merge. Despite being independent on programming languages, our move rules model extracting and inlining functions. We evaluated Summer on ConflictBench, a large benchmark of real-world merge scenarios, comparing it with five pioneering merge tools across Java and non-Java files. Experimental results show that Summer achieved the highest 36% accuracy in reproducing merges verbatim identical to developers', and ranked second in semantic accuracy.

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.