Code Migration with Formal Verification for Performance Improvement of Legacy Code
Domain-specific languages (DSL) have been a common alternative to general-purpose programming languages to implement domain-specific optimizations and simplify programming of domain-specific hardware
Domain-specific languages (DSL) have been a common alternative to general-purpose programming languages to implement domain-specific optimizations and simplify programming of domain-specific hardware. For instance, tremendous breakthroughs in AI applications seen in the past few years have been partly because of and have also led to developments in AI-specific hardware accelerators and programming frameworks such as PyTorch, TensorFlow, among others.
Although DSLs have simplified AI application development, they have two inherent problems: (i) programming using DSLs has been a steep learning curve for the programmers [1], who are otherwise well-versed in general purpose programming languages such as C++, Python, etc., and (ii) Domain-specific hardware, which can be programmed via DSLs, does not support code written in general-purpose programming languages, and thus leads to creation of "legacy code", which does not benefit from the latest advancements in domain-specific hardware (e.g., AI accelerators). Moreover, legacy code leads to maintenance burden, not to mention the associated security risks; moreover, it also eliminates any opportunities to leverage the latest advancements in the domain-specific hardware.
... legacy code leads to maintenance burden, not to mention the associated security risks; moreover, it also eliminates any opportunities to leverage the latest advancements in the domain-specific hardware.
Porting legacy code to newer, popular DSLs has thus been an important business as well as research problem. The problem of transforming and compiling code from one language to another is called code transpilation. Given the fact that manual code transpilation is effort-intensive, cumbersome, error-prone, and expensive, several research efforts have been devoted to developing automated code transpilation approaches. Unfortunately though, many of these approaches suffer from either being DSL-specific (not extendible) [2, 3], or do not verify the generated code. Indeed, one of the key characteristics of a code transpilation tool would be whether it ensures that the translated code is correct. We, at CodeMetal, have been building systems to address the need of automated code transpilation and that too for programming languages for different edge hardware. More importantly, we are developing fully-automated systems that translate code from one language to another, while also ensuring the correctness and the performance of the output code for various target hardware.
We, at CodeMetal, have been building systems to address the need of automated code transpilation and that too for programming languages for different edge hardware. More importantly, we are developing fully-automated systems that translate code from one language to another, while also ensuring the correctness and the performance of the output code for various target hardware.
In this article, we cover one of our prior research works that we developed in collaboration with Prof. Alvin Cheung and his research group from University of California at Berkeley.
Verified Translation of C++ Programs to AI-Specific DSLs
Rapid pace of innovation in AI has led to creation of AI specific hardware accelerators as well as new software frameworks (e.g., PyTorch, TensorFlow, etc.) that enable programmers to leverage those accelerators. These frameworks also simplify AI application development as well as deployment. Given the popularity of Python that pre-dated these frameworks, all of these frameworks provide Python-like programming languages that support new APIs for various AI computations. Learning these new languages is a steep learning curve, and more importantly, legacy code (written in different languages) would not benefit from the latest hardware accelerators until it is ported to these new frameworks.
In this research effort, we address the problem of migrating legacy code to AI-specific DSLs (called tensor DSLs) such as Numpy, TensorFlow, PyTorch, and others by developing a framework called Tenspiler. In particular, Tenspiler approaches the problem of code transpilation as a program synthesis and verification problem. For this problem, the goal of the synthesis step is to search a program in the target language, while the goal of the verification step is to verify that the searched target program is semantically-equivalent to the source program. Conceptually, this approach is simpler than the commonly-used, rule-based, pattern matching approach (used in source-to-source translation tools or compilers) because it does not require programming or maintaining the translation rules. Contrarily, Tenspiler searches the source-to-target language translation rules by employing program synthesis. Specifically, Tenspiler uses verified lifting [4] - a program synthesis technique that uses an intermediate language (IR) to simplify synthesis as well as verification.
TensIR - Tenspiler's IR for Tensor Operations
While we can conceptually translate an input program in a source language directly into an output program in a target language via program synthesis, doing so would have a few drawbacks: 1) such a translator would be specific to the source and target language, and 2) the grammar of the target language (that determines the set of possible programs in the target language - also called as the search space) may allow programs of infinite depth (e.g., via loops or recursion), leading to unbounded search space. The solution adopted in the program synthesis community is to develop an intermediate language (IR) that captures the semantics of the target language while also restricting the size of the search space.
In Tenspiler, we have developed an IR, called TensIR, that captures commonly-used tensor operations (such as matrix multiplication, tensor-scalar operations, etc.) and types (e.g., vectors, scalars, tensors, etc.) from popular tensor DSLs. More importantly, the grammar of the TensIR is such that it restricts the search space of the possible TensIR programs while translating input programs into tensor DSLs. We show TensIR grammar below.

Design of Tenspiler
Figure below shows the overall design of Tenspiler. In essence, it is divided into three main components: synthesis, verification, and code generation. We cover all the three steps below.

Program Synthesis - Synthesizing TensIR Program
Now that we have discussed TensIR and its purpose, we now discuss how our program synthesis step leverages it. The goal of Tenspiler's program synthesis step is to search for a TensIR program that is semantically equivalent to the input program in a source language. Technically, the semantically-equivalent program in the target language is called program summary (PS) in the field of program synthesis. In addition to PS, the output of our synthesis step is also loop invariants (INV) for the input program. Conceptually, loop invariants are boolean expressions that capture relationships between input and output variables of a loop that hold true for any iteration of the loop. INV are required for our synthesis problem because the input programs had loops and loop invariants are required to prove the equivalence of such programs.
In Tenspiler, we formulate the search problem as a Syntax-guided Synthesis problem [5] (called SyGuS in short). The key idea in syntax-guided synthesis is that the syntax of the target language (TensIR in our case) guides the synthesis (i.e., search) of the target program. Our SyGuS problem is characterized by three parameters:
- The specification: it describes the program property that the synthesized TensIR program should satisfy – intuitively, it is the condition that checks if the synthesized program is equivalent to the input program. In Tenspiler, we use verification conditions (VCs) as the specification for the synthesis phase as VCs provide full guarantees (up to a bound) on the equivalence of the input program and the synthesized program. Intuitively, VCs are logical expressions that encode the semantics of the input program.
- The search space that describes the set of possible TensIR programs. We use TensIR grammar to define the search space. Conceptually, the search space imposes syntactic constraints on the structure of the output TensIR program.
- The search algorithm that searches for the TensIR program. In Tenspiler, we use an enumerative algorithm, meaning it enumerates all possible programs up to a certain depth.
Verification
Given the limitations of the current SyGuS solvers, our synthesis step finds PS and INV by validating them against VC for a bounded set of program states only. In other words, our synthesis step does not guarantee full semantic equivalence between the input program and the synthesized TensIR program. To ensure complete semantic equivalence, we need a verification step. In Tenspiler, we leverage an existing SMT solver (CVC5) and ask it to find a PS and INV that violate VC (i.e., a counter example). Inability of the solver to find such an example indicates full semantic equivalence, otherwise we add the found counter example as a clause in VC and repeat the step.
Code generation
TensIR, as an IR, allows us to keep the program synthesis step tractable and also to verify the semantic equivalence between the input and the output programs. Given our goal of generating an output program in the target language though, we implement a rule-based code generator that accepts the TensIR program as an input and outputs the target program in the target languages. Tenspiler accepts input programs written in C/C++/Python and is able to generate output programs in six different AI-specific DSLs: Numpy, Tensorflow, PyTorch, Apple MLX, TPC-C, and Gemmini.
Tenspiler accepts input programs written in C/C++/Python and is able to generate output programs in six different AI-specific DSLs: Numpy, TensorFlow, PyTorch, Apple MLX, TPC-C, and Gemmini.
Verification
Given the limitations of the current SyGuS solvers, our synthesis step finds PS and INV by validating them against VC for a bounded set of program states only. In other words, our synthesis step does not guarantee full semantic equivalence between the input program and the synthesized TensIR program. To ensure complete semantic equivalence, we need a verification step. In Tenspiler, we leverage an existing SMT solver (CVC5) and ask it to find a PS and INV that violate VC (i.e., a counter example). Inability of the solver to find such an example indicates full semantic equivalence, otherwise we add the found counter example as a clause in VC and repeat the step.
Code Generation
TensIR, as an IR, allows us to keep the program synthesis step tractable and also to verify the semantic equivalence between the input and the output programs. Given our goal of generating an output program in the target language though, we implement a rule-based code generator that accepts the TensIR program as an input and outputs the target program in the target languages. Tenspiler accepts input programs written in C/C++/Python and is able to generate output programs in six different AI-specific DSLs: Numpy, Tensorflow, PyTorch, Apple MLX, TPC-C, and Gemmini.
Tenspiler accepts input programs written in C/C++/Python and is able to generate output programs in six different AI-specific DSLs: Numpy, TensorFlow, PyTorch, Apple MLX, TPC-C, and Gemmini.
Conclusion
All in all, the results show that Tenspiler could migrate all 69 C++ programs (belonging to different domains) to different domain-specific languages for tensor operations, and more importantly, it also formally verified that the generated DSL programs are semantically-equivalent to the input C++ programs.
Finally, the performance evaluation of Tenspiler showed that it could migrate the input programs in a reasonably short period of time (less than 15 mins). And more importantly, the migrated programs enjoy considerable performance improvement (from 15X to 557X) when executed on the latest AI accelerator devices or even commodity CPU platforms.
Finally, the performance evaluation of Tenspiler showed that it could migrate the input programs in a reasonably short period of time (less than 15 mins). And more importantly, the migrated programs enjoy considerable performance improvement (from 15X to 557X) when executed on the latest AI accelerator devices or even commodity CPU platforms.
Related Research
UniPar: A Unified LLM-Based Framework for Parallel and Accelerated Code Translation in HPC
Introduces UniPar, an evaluation framework for assessing how large language models translate between parallel programming languages, achieving 69% compilation success and 33% functional correctness through fine-tuning and compiler-guided repair.
Workflows vs Agents for Code Translation
Compares structured workflows versus agentic approaches for MATLAB-to-HDL translation, showing that agentic methods with the Model Context Protocol increase simulation reach rates by over 20 percentage points on mid-sized models.