What is a transpiler?

Julian Konchunas
Madfish Solutions
Published in
5 min readFeb 8, 2020

A little journey into how one programming language gets translated into another on the example of our Solidity to LIGO transpiler

“Transpilation” unlike “compilation” is a process of translating a code from one language into another, given that they are on the same level of abstraction. Take assembly, for instance. It is a low-level language and its instructions practically correspond to commands issued directly on the processor. Rust and C, on the other hand, operate on abstract mathematical entities such as variables and functions. Only later these entities are being compiled (hence the name) into processor instructions. But if you translate C to Rust this process is called “transpilation”, since they are both high-level languages.

How it is done

The process of translation is pretty similar to the one of compilation. In a nutshell, the idea is to parse the source language and generate target language output. In between there is a medium of translation called Abstract Syntax Tree, or AST for short. You may think of AST as of a tree-like JSON structure, containing every word and instruction of the language. And since it is a tree, every language unit is aware its own properties and also stores pointers to its parents and siblings. For example, this code:

Would be roughly represented as the following AST:

In detail

Transpiling a language generally involves a few stages: parsing, transformation, type inference, and translation

The very first stage required is Parsing. Parser takes the input language and splits it into syntactic units to create aforementioned AST. To our luck, Solidity compiler solc is able to produce such AST in JSON format out of the box. Parsing language is a complex task with hundreds of articles on this topic existing already. So let’s just continue to interesting parts and assume we have our input language neatly parsed into AST structure.

Next one is Transformation — a process of modifying a syntax tree for the purpose of getting data better suited for the target language. The stage is aimed at adapting the semantics of the language, since even seemingly
equivalent statements may require a special treatment. For example, in Solidity we can write widely used C-like for loop with 3 expressions like this:
for (init; condition; step)

LIGO doesn’t support such a statement, but as you may have guessed, it is possible to convert any for loop into a while loop like so:

init;
while (condition) {

step;
}

And that’s exactly where the transformer fits. It removes a for3 node from AST and creates a logically equivalent while statement to replace it. Keep in mind, it is operating on the syntax tree and not on the actual language text yet. Transformer is what accepts AST and outputs modified AST by recursively traversing nodes. You can take a look at our for3 transformer for sol2ligo here for a better understanding.

Sure thing quite a bit more transformers are necessary for different cases of ranging complexity in sol2ligo. They perform a wide range of operations: generating router, emulating state, inlining modifiers, unrolling libraries, taking care of inheritance and so forth.

Type inference — is a special step required to perform better informed decisions on types translation. Since LIGO compiler can’t do a type inference on its own, we kindly lend our helping hand here. Let’s consider returning from a function as an often occurring case. If you want to use the returned value you are ought to declare a variable like this:

const v : <type> = func()

Basically type inferrer’s task is to analyze all function declarations beforehand and remember their returned types for every function call. The same thing happens when we deal with variable declarations. In case we need to translate assignment operation we just peek to the right side of expression and infer the type based on what’s going on there.

Sometimes it is not easy to infer a type based on the data of the node’s closest siblings. That is why our type inference is divided into two stages. Roughly speaking, the first one identifies type from AST node and its neighbors and the second one may perform line traversal to fetch type info from farther away. Sometimes even this may not be enough, since statements like var x = 0; simply do not provide enough information about the type. Solidity assumes x of some ambiguous number type and we have to seek later operations to understand if this was intended to be int or nat.

If we got you interested, our type inference module can be found here.

Finally, the Translation stage (sometimes called “generation”) is what produces actual code from a syntax tree. You pass the an AST as an input and it outputs a valid LIGO code and does formatting, knowing how to interpret each node of the tree. It understands where to put brackets and what statements should be grouped into blocks. It recursively enters every leaf thus monstrosity like this:

Effectively gets translated into neat little function like this:

That is basically how your Solidity contract gets translated into LIGO contract. It is a simple enough process, but it requires careful thinking. We always keep in mind that similarly looking code may not work properly and even introduce subtle vulnerabilities, which did not exist in the original source code.

You may take a look at our project here: Github — sol2ligo. Give us a star to show your support!

This is a followup to the our story on Solidity to LIGO transpiler funded by Tezos Foundation. Check out our introduction story here if you want to learn more.

--

--

Julian Konchunas
Madfish Solutions

Blockchain engineer at @madfish.solutions. Ex-Ubisoft. Telegram channel https://t.me/tenxer