CodePivot: Bootstrapping Multilingual Transpilation in LLMs via Reinforcement Learning without Parallel Corpora

April 20, 20262604.18027

Shangyu Li, Juyong Jiang, Meibo Ren, Sizhe Zhong, Huiri Tan + 5 more

cs.SEcs.CL

TLDR

CodePivot uses reinforcement learning and Python as an intermediate to enable multilingual code transpilation in LLMs without parallel corpora.

Key contributions

Introduces CodePivot, a framework for multilingual code transpilation in LLMs.
Uses Python as an intermediate representation to avoid parallel corpora.
Proposes a novel Aggressive-Partial-Functional reinforcement learning reward.
Outperforms much larger LLMs on transpilation across 10 programming languages.

Why it matters

This paper addresses critical limitations in LLM-based code transpilation, especially for low-resource languages. By eliminating the need for parallel corpora and introducing an innovative RL reward, CodePivot makes multilingual transpilation more practical and efficient. It significantly advances the field by enabling broader language support.

Original Abstract

Transpilation, or code translation, aims to convert source code from one programming language (PL) to another. It is beneficial for many downstream applications, from modernizing large legacy codebases to augmenting data for low-resource PLs. Recent large language model (LLM)-based approaches have demonstrated immense potential for code translation. Among these approaches, training-based methods are particularly important because LLMs currently do not effectively adapt to domain-specific settings that suffer from a lack of knowledge without targeted training. This limitation is evident in transpilation tasks involving low-resource PLs. However, existing training-based approaches rely on a pairwise transpilation paradigm, making it impractical to support a diverse range of PLs. This limitation is particularly prominent for low-resource PLs due to a scarcity of training data. Furthermore, these methods suffer from suboptimal reinforcement learning (RL) reward formulations. To address these limitations, we propose CodePivot, a training framework that leverages Python as an intermediate representation (IR), augmented by a novel RL reward mechanism, Aggressive-Partial-Functional reward, to bootstrap the model's multilingual transpilation ability without requiring parallel corpora. Experiments involving 10 PLs show that the resulting 7B model, trained on Python-to-Others tasks, consistently improves performance across both general and low-resource PL-related transpilation tasks. It outperforms substantially larger mainstream models with hundreds of billions more parameters, such as Deepseek-R1 and Qwen3-235B-A22B-Instruct-2507, on Python-to-Others tasks and Others-to-All tasks, respectively. In addition, it outperforms its counterpart trained directly on Any-to-Any tasks on general transpilation tasks. The code and data are available at https://github.com/lishangyu-hkust/CodePivot.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers