CodePivot: Bootstrapping Multilingual Transpilation in LLMs via Reinforcement Learning without Parallel Corpora
Shangyu Li, Juyong Jiang, Meibo Ren, Sizhe Zhong, Huiri Tan + 5 more
TLDR
CodePivot uses reinforcement learning and Python as an intermediate to enable multilingual code transpilation in LLMs without parallel corpora.
Key contributions
- Introduces CodePivot, a framework for multilingual code transpilation in LLMs.
- Uses Python as an intermediate representation to avoid parallel corpora.
- Proposes a novel Aggressive-Partial-Functional reinforcement learning reward.
- Outperforms much larger LLMs on transpilation across 10 programming languages.
Why it matters
This paper addresses critical limitations in LLM-based code transpilation, especially for low-resource languages. By eliminating the need for parallel corpora and introducing an innovative RL reward, CodePivot makes multilingual transpilation more practical and efficient. It significantly advances the field by enabling broader language support.
Original Abstract
Transpilation, or code translation, aims to convert source code from one programming language (PL) to another. It is beneficial for many downstream applications, from modernizing large legacy codebases to augmenting data for low-resource PLs. Recent large language model (LLM)-based approaches have demonstrated immense potential for code translation. Among these approaches, training-based methods are particularly important because LLMs currently do not effectively adapt to domain-specific settings that suffer from a lack of knowledge without targeted training. This limitation is evident in transpilation tasks involving low-resource PLs. However, existing training-based approaches rely on a pairwise transpilation paradigm, making it impractical to support a diverse range of PLs. This limitation is particularly prominent for low-resource PLs due to a scarcity of training data. Furthermore, these methods suffer from suboptimal reinforcement learning (RL) reward formulations. To address these limitations, we propose CodePivot, a training framework that leverages Python as an intermediate representation (IR), augmented by a novel RL reward mechanism, Aggressive-Partial-Functional reward, to bootstrap the model's multilingual transpilation ability without requiring parallel corpora. Experiments involving 10 PLs show that the resulting 7B model, trained on Python-to-Others tasks, consistently improves performance across both general and low-resource PL-related transpilation tasks. It outperforms substantially larger mainstream models with hundreds of billions more parameters, such as Deepseek-R1 and Qwen3-235B-A22B-Instruct-2507, on Python-to-Others tasks and Others-to-All tasks, respectively. In addition, it outperforms its counterpart trained directly on Any-to-Any tasks on general transpilation tasks. The code and data are available at https://github.com/lishangyu-hkust/CodePivot.
📬 Weekly AI Paper Digest
Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.