Implementing unused variable elimination and undefined variable detection for the Solang compiler

Lucas Ste
Coinmonks
5 min readDec 6, 2021

--

Photo by Clint Adair on Unsplash

In June 2021, I applied for a mentorship through the Linux Foundation and got accepted to contribute to the Solang compiler and implement unused variable elimination, undefined variable detection and common subexpression elimination. In this article, I will explain my techniques for unused variable elimination and undefined variable detection.

Solang is a Solidty compiler that targets Solana, Substrate and ewasm. It is written in Rust and uses LLVM as the backend. It allows us to write many optimizations for the Solidity language before invoking LLVM. Solang is at an early stage, but has some compiler infrastructure, such as a parse tree, an abstract syntax tree (AST) and an intermediate representation for the control flow graph (CFG). It has also a few optimizations, such as constant folding and strength reduction.

There was no existing implementation of unused variable elimination and undefined variable detection, so it was my job as a mentee to implement this. The first goal was to detect unused variables. Every local variable is placed inside a symbol table (see the Github gist), so that we can tracks its references throughout the code.

Storage variables, in turn, are tacked at each contract, using a vector of variables:

During Solang’s semantic analysis, we traverse the parse tree and build the abstract syntax tree. Whenever we reach an expression node, we can detect any variable that is referenced inside the expression. By expression node, I am referring to nodes that represent basic expressions, like addition, subtraction, boolean AND, etc. For variables names in expressions, we do a lookup for the index of the variable name in the symbol table and create an AST node with a reference to the variable index. Likewise, a storage variable AST node contains its index in the contract’s variable vector.

These variable indexes allow us to retrieve reference to those entities, so that we can signal that the variables have been read inside the Solidity code. Using the same aforementioned reasoning, we can identify assignments during the parse tree traversal and leverage the variable nodes to mark variables as assigned.

The logic for identifying unused variables also work for detecting event definitions that have never been emitted. The main difference here is that events are tracked inside Namespace, at an events vector, as shown below.

After the semantic analysis, we traverse the symbol table, the contract’s variable vector and the namespace’s events vector to identify unused variables and events, and emit compilation warnings.

It’s important to mention that there is an edge case here. We cannot mark public contract variables as unused because they can be accessed directly through a generated accessor function, which we can call in a blockchain transaction, even though they are not referenced anywhere in the Solidity code.

After knowing exactly which variables have never been used, we can implement the unused variable elimination. Our code allows Solang to remove the following cases from the intermediate representation:

  • Variables that have never been read nor assigned.
  • Variables that have been assigned, but never read.
  • Assignments of unused variables, including destructures.
  • Assignments of unused variables inside expressions (e.g. a = b + (c=1))

To do so, we leverage Solang’s AST traversal during code generation. Solang utilized such a step to create the control flow graph (CFG). Whenever we reach a variable declaration, we heck the symbol table or the contract’s vector of variables to check whether the declaration should be removed from the CFG, as the following gist shows:

Likewise, when we reach an assignment node, we check if the entity on the left hand side of the assignment is a variable that has never been used. If so, we can just ignore the assignment and continue building the CFG. The complex case happens when there is an assignment within an expression (e.g. a = b + (c=1)). Solang already differentiates an assignment statement and an assignment expression. The former is the standalone assignment command (e.g. a = b+1), and the latter is the assignment inside an expression. This way, we can treat such a special case without worrying about standalone assignments.

If the assignment inside an expression can be removed, we ignore it and only add the right hand side of the assignment to the CFG. The right hand side is intermingled with its containing expression. The snippet below shows the expression we return to the CFG, when we process an assignment expression.

It is worth mentioning that not all variables can be removed. Function parameters and return variables should not be removed as they are an essential part of a contract’s structure. Public contract variables should also not be eliminated, for we can directly access them through accessor functions in blockchain transactions.

Solang has already got a reaching definitions implementation that serves to complement some Solidity specific optimizations. We can leverage it to identify undefined variables. A variable is undefined if it has been declared, but no initializes, and this non-initialized value reaches an instructions where the variable is read.

To find undefined variable usages, we traverse the CFG and check what definitions reach the variables referenced at the instruction we are analyzing. If any reference is undefined, we raise an error and stop compilation.

Undefined variable detection and unused variable elimination have utilized Solang’s existing infrastructure. The third milestone of my mentorship, common subexpression elimination is a completely new compiler pass, so I’ll be presenting it in another article.

Join Coinmonks Telegram Channel and Youtube Channel learn about crypto trading and investing

Also, Read

--

--