How do compilers work 2 — Middle end

Last week we had a look at what the front end of a compiler is doing. This week, we’ll take a look at what the middle end is doing to take the output of the front end and prepare it for the back end.

The middle end takes care of platform independent optimisations of the source code, readying it for conversion to machine code. This involves removal of dead code, propagation of constants in the code, potentially re-ordering code so that less used code is not defined in the middle of highly used code.

In order to perform the middle end analysis of code, there are a few techniques that can be used. Static Single Analysis allows for lots of optimisation techniques on object oriented code. Continuation-Passing Style is used by functional languages and allows for easy Tail Call Optimisation and Memoization.

Static Single Assignment (SSA)

This is done by finding each use of a variable, and referring to it as a new variable e.g. assignments to x become x1, x2, x3, etc. and usages refer to a specific assignment. From this form, it is easy to see places where variables are assigned to but never used.

The sequence of assignments and reads of a variable are known as its use-define chain.

Once Static Single Analysis has been performed over code, there are several optimisations that can be performed, using this new form. All of these optimisations aim to remove code that is never executed or reduce the code that is executed into fewer lines.

Constant Folding

This involves finding computations in the code that are static and evaluating them once at compile time rather than every time at run time.

This can be done at a basic level on all code but is enhanced if the code has undergone Static Single Analysis, as some values may be uncovered as never changing.

int secondsInAnHour = 60 * 60;
int secondsInADay = secondsInAnHour * 24;

Is only ever one value, so can be re-assigned:

int secondsInADay = 86400;

Dead code elimination

Dead code elimination finds:

  • Values that are assigned to but never read
  • Unreachable code (e.g. code after a return that is always hit)

And removes them. This makes the final machine code smaller in size, and potentially faster as execution will not have to execute irrelevant lines.

Code eliminated at this stage is unconditionally dead (i.e. can never be reached). Dynamic dead code elimination is performed at runtime by some environments, which detects code that can not be run e.g. due to platform restrictions, and removes it.

Sparse conditional constant propagation

This optimisation considers code along with some context in order to eliminate certain cases of dead code. It does this by following branches of code, maintaining a record of the values that variables can take at that point.

If both branches can be hit by the code, both will be followed. Once interpretation of the code is complete, any lines that have not been hit will be marked as dead code and removed.

function int foo(int x) {
if (x < 0) { return 1; }
return -1;

In this case, 1 will never be returned, but to find this out, the context of use of the function needs to be considered.

Common Subexpression Elimination

This optimisation looks for recurring computations and replaces them with a single calculation which can be used in multiple places.

int foo = a + b + c;
int bar = a + b - d;

This can be optimised reducing computation at runtime, to the following:

int temp = a + b;
int foo = temp + c;
int bar = temp - d;

This is done using a lexical matching and does not analyse the underlying value of the variables.

Global Value Numbering

This is similar to Common Subexpression Elimination, but rather than matching on character values, it aims to match on underlying values.

int foo = 10;
int bar = 10;
int a = foo + 10;
int b = bar + foo;

In this example, foo and bar always have the same value, and a and b always have the same value, so runtime computation can be saved (and there’s also the potential for constant folding to then reduce the constants that are in the code).

int foo = 10;
int bar = foo;
int a = foo + 10;
int b = a;

Partial Redundancy Elimination

This optimisation is similar to Common Subexpression Elimination but matches expressions that are the lexically the same in different branches of the code.

if (a) {
int foo = bar * 10;
} else {
// Some code
int baz = bar * 10;

This can be optimised by noticing that the first branch of code performs bar * 10 twice. Therefore, it can be optimised to perform the calculation once regardless of code path, as follows:

if (a) {
int temp = bar * 10;
int foo = bar * 10;
} else {
// Some code
int temp = bar * 10;
baz = temp;

Continuation-Passing Style (CPS)

Functional compilers can use this style as an Intermediate Representation. To generate this style, every function is updated to include an extra parameter that is the callback to execute when the return value is generated. The advantage of this style is that it highlights when tail recursion occurs (the same function is called with the same continuation), thus allowing for tail call optimisation (this helps avoid stack overflow exceptions).

The callbacks passed to each function are automatically generated by the compiler, and are referred to as thunks. Some compilers (e.g. Haskell) will also save the output of thunks to prevent having to recalculate it if the thunk is called again (this is known as memoization).

Now that we’ve seen what the middle end is doing to optimise code, next week we’ll see what the back end is doing to finish off the process with code generation.

If you have enjoyed this article, recommend it with the button.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.