Introduction to the Study and Its Objectives

The integration of artificial intelligence (AI) into the programming ecosystem has revolutionized the field, resulting in particularly significant advances in code translation. The purpose of this paper is to discuss a series of previous studies and analyze the results obtained directly from these fundamental studies.

It is important to emphasize that the conclusions presented here are not limited to a simple compilation of these findings; they represent a fusion between the data extracted from the cited studies and the original research conducted by the authors. This approach produces a rich synthesis that combines existing knowledge with new perspectives and makes a significant contribution to the field of AI-assisted code translation.

For the research, large-scale language models (LLMs) that demonstrated greater potential in the task of code conversion were selected. The purpose of this choice was to answer the main research questions (RQs), exploring the effectiveness of LLMs in code translation, understanding the nature of translation errors, and investigating potential solutions to mitigate these difficulties.

Code translation aims to convert the source code from one programming language (PL) to another. Given the promising abilities of large language models (LLMs) in code synthesis, researchers are actively exploring their potential to automate code translation.

Understanding the limitations of LLMs is essential to advance the state of LLM-based code translation. To this end, large-scale studies are being conducted to investigate the capabilities of LLMs, including general LLMs and code LLMs, in translating code between different language pairs, including C, C++, Go, Java, and Python.

General LLMs

General LLMs are models pre-trained on vast datasets of textual information, encompassing both natural language and programming code. They are designed to be applied across a wide range of tasks. The studies and papers referenced focused on the most advanced general LLMs, with capacities of up to 20 billion parameters, as listed on the Hugging Face's Open LLM Leaderboard, including models like GPT-4, Llama2, TB-Airoboros, and TB-Vicuna. These represent a diverse selection of highly versatile AI models.

Code LLMs

Code LLMs are developed with a specific purpose: to facilitate and automate tasks directly associated with programming. The studies and papers mentioned focused on the three most innovative and recently released code LLMs: CodeGen, StarCoder, and CodeGeeX, which represent the latest in AI specialization for coding tasks.

The LLMs detailed in this article.

The LLMs detailed in this article.

Effectiveness of LLMs in Code Translation

Except for GPT-4 and StarCoder, all other models performed unsatisfactorily.

There is a strong correlation between the average number of tests per code translation sample and the unsuccessful translation (correlation coefficient (r) ranging from 0.64 to 0.85 for all models). That is, the more rigorous the existing test set, the better it can evaluate whether a translation preserves functionality.

In this study (https://arxiv.org/pdf/2308.03109.pdf), 43,379 translations were performed across all listed LLMs, measuring success against the tests provided with the code samples.

The study also highlighted a stark contrast in LLM performance between real-world projects and crafted benchmarks. GPT-4 managed a success rate of 8.1% in real-world projects, while the other models achieved no success at all.

Unsuccessful rate%

Unsuccessful rate%