LLM4C2Rust: Large Language Models for Automated Memory-Safe Code Transpilation

April 16, 20262604.15485

Sarah Bedell, Nazanin Siavash, Armin Moin

cs.SE

TLDR

LLM4C2Rust introduces a RAG-enhanced framework using LLMs to automate C/C++ to memory-safe Rust transpilation, effectively eliminating common safety vulnerabilities.

Key contributions

Proposes a RAG-assisted framework integrating LLMs and SLMs for C/C++ to Rust transpilation.
Deploys a segmentation strategy, guiding the LLM with Rust documentation and compiler error references.
Experiments show improved code correctness and security, eliminating Raw Pointer Dereferences (RPDs) and Unsafe Type Casts (UTCs).
Achieves complete elimination of RPDs and UTCs in several Coreutils programs' Rust output.

Why it matters

This paper offers an automated, LLM-driven solution to critical memory safety issues in C/C++ systems. It provides a path for modernizing software to Rust, enhancing security and reducing manual effort. This marks a significant advancement towards safer, more reliable software development.

Original Abstract

Memory safety has long been a critical challenge in software engineering, particularly for legacy systems written in memory-unsafe languages such as C and C++. Rust, one of the youngest modern programming languages, offers built-in memory-safety guarantees that make it a strong candidate for secure systems development. Consequently, transpiling C/C++ code into memory-safe Rust code has become a growing area of research. However, manual transpilation is often time-consuming and error-prone. Additionally, rule-based automated approaches are not as flexible or cost-effective as methods enabled by state-of-the-art AI models, techniques, and methods, such as those that deploy Large Language Models (LLMs), for example, Generative Pretrained Transformers (GPT). In this paper, we propose a Retrieval-Augmented Generation (RAG)-assisted framework that integrates an LLM with a Small Language Model (SLM) to perform C/C++-to-Rust transpilation with a focus on enhancing memory safety. The framework deploys a segmentation strategy that processes C/C++ code in balanced blocks, guiding the LLM with retrieved context from Rust documentation and compiler error references. Our experiments using three OpenAI models (GPT-4o, GPT-4-Turbo, and o3-Mini) demonstrate that the RAG-enhanced pipeline generally improves both code correctness and security for C-to-Rust code transpilation. Several Coreutils programs achieve complete elimination of Raw Pointer Dereferences (RPDs) and Unsafe Type Casts (UTCs) in the final Rust output, indicating the potential of LLM-based transpilation for advancing automated software modernization and repair, as well as memory-safe code generation.

View on arXiv Download PDF

📬 Weekly AI Paper Digest

Get the top 10 AI/ML arXiv papers from the week — summarized, scored, and delivered to your inbox every Monday.

TLDR

Key contributions

Why it matters

Original Abstract

📬 Weekly AI Paper Digest

Related papers