Implementing SSA compilers with Rust

Bits Of Thought
10 min readMar 20, 2023

--

As part of a typical undergraduate Computer Science curriculum, most students learn about a compiler’s front end — i.e. scanning and parsing, sometimes followed by interpreting the AST (abstract syntax tree). This series continues past a prototypical front end and shows how a compiler’s middle- and backend (including optimization, SSA construction, register allocation, and code generation) can be implemented. In this first article, we will be generating SSA bytecode from a program’s parse tree!

The source code referred to can be found in full on GitHub.

Prerequisites

Some general knowledge about compiler design helps one understand everything the article covers - such as taught by an introductory compiler course.

Getting started

We already wrote the compiler’s front end/parser for a minimal imperative toy language. Our parser generates a simple syntax tree and performs type-checking and name resolution of the program. I implemented the parser using a Rust port of yacc and lex included in grmtools. The parser’s details are beyond our scope, but you can find its implementation in language.l language.y parser.rs; The syntax of our language is as follows:

program = { var_def | function_def } ;

var_def = identifier "::" type "=" expr ;

number = ["-"] digit { digit } ;
digit = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;

identifier = (* {A-z | 0-9 } *) ;
call = identifier argument_list ;

type = "Int" | "Bool"

function_def = "lambda " identifier parameter_list "::" type body;
parameter_list = "(" [identifier "::" type] {"," identifier "::" type} ")" ;
argument_list = "(" [expr] {"," expr} ")" ;
body = "{" [stmt] {";" stmt} "}" ;
stmt = var_def
| assign
| ifelse
| while
| return ;
assign = identifier "=" expr ;
ifelse = "if" expr "then" body "else" body ;
while = "while" expr "do" body ;
return = "return" expr ;

expr = expr "or" expr
| expr "xor" expr
| term ;
term = term "and" bterm
| bterm ;
bterm = "not" cterm
| cterm ;
cterm = cterm ">=" factor
| cterm "<" factor
| cterm "==" factor
| factor ;
factor = factor "+" atom
| factor "-" atom
| atom ;
atom = atom "*" unit
| atom "/" unit
| unit ;
unit = identifier | "true" | "false" | call | "(" expr ")" | number ;

Our first step is to transform the AST into a linear virtual instruction set used by the rest of the compiler. We begin by linearizing our AST. By traversing the tree in depth-first post-order we ensure the code for generating the dependencies of each operation is generated before emitting code for the operation itself. In the below example, we begin by evaluating 5. When evaluating var_def, we have already generated code for the expression 5 and moved the value to t, which we then copy into x. The redundant moves between registers will later be eliminated by our optimizer.

AST traversal during linearlization; indices show evaluation order

In our code, the heavy lifting of IR code generation is done by the recursive function translate_block in ir.rs. It receives a Vector to fill with operations, a nested table structure for variable binding, the current instruction to process, and a generator for creating new virtual register names. The function is implemented as one large match case with specialized code for every type of AST node we may encounter. To visualize, I have included the code for IfElse, True, and False below.

fn translate_block(
vec: &mut Vec<Operator>,
scope: &mut SheafTable<Rc<str>, Scope>,
instr: parser_defs::Any<'_>,
gen: &mut VRegGenerator,
) -> Option<VReg> {
match instr {
//...
Any::S(s) => match s {
parser_defs::Statement::IfElse(c, t, e) => {
// we first translate the condition c (post-order)
let c = translate_block(vec, scope, Any::E(c), gen).unwrap();
// we allocate new labels for branching
let l1: Rc<_> = gen.next_label().into();
let l2: Rc<_> = gen.next_label().into();
let l3: Rc<_> = gen.next_label().into();
let next = gen.next_reg();
vec.push(Operator::Li(next, 1));
// if c then branch to l1, else to l2
vec.push(Operator::Beq(c, next, Rc::clone(&l1), Rc::clone(&l2)));
vec.push(Operator::Label(Rc::clone(&l1)));
// translate the then block
translate_block(vec, scope, Any::B(t), gen);
vec.push(Operator::J(Rc::clone(&l3)));
vec.push(Operator::Label(Rc::clone(&l2)));
// translate the else block
translate_block(vec, scope, Any::B(e), gen);
vec.push(Operator::J(Rc::clone(&l3)));
// l3 is the jump target for both blocks
vec.push(Operator::Label(Rc::clone(&l3)));
// we didn't evaluate an expression, so no result (VReg)
None
},
//...
},
//...
Any::U(u) => match u {
//...
parser_defs::Unit::True => {
let res = gen.next_reg();
// true = 1
vec.push(Operator::Li(res, 1));
Some(res)
}
parser_defs::Unit::False => {
let res = gen.next_reg();
vec.push(Operator::Li(res, 0));
// false = 0
Some(res)
}
}
_ => panic!("Unexpected instr"),
}
}

Using our new code generator, we can transform the following program into virtual, linear instructions:

myvar3 :: Bool = false;
lambda myfun(myvar3 :: Int) :: Int {
myvar4 :: Int = 0;
i :: Int = 100;
while (i >= 0) do {
if i / 2 / 2 / 2 < 2 then {
myvar4 = myvar4 * 3;
} else {
myvar4 = myvar4 / 2;
}
}
return myvar4;
}
myvar2 :: Bool = true;

======>

li rd1, #0
li rd2, #100
j @_LABEL_0
@_LABEL_0:
li rd4, #0
slt rd5, rd2, rd4
li rd6, #1
xor rd7, rd6, rd5
li rd3, #0
beq rd7, rd3, @_LABEL_1, @_LABEL_2
@_LABEL_2:
li rd8, #2
div rd9, rd2, rd8
li rd10, #2
div rd11, rd9, rd10
li rd12, #2
div rd13, rd11, rd12
li rd14, #2
slt rd15, rd13, rd14
li rd16, #1
beq rd15, rd16, @_LABEL_3, @_LABEL_4
@_LABEL_3:
li rd17, #3
mult rd18, rd1, rd17
mv rd1, rd18
j @_LABEL_5
@_LABEL_4:
li rd19, #2
div rd20, rd1, rd19
mv rd1, rd20
j @_LABEL_5
@_LABEL_5:
j @_LABEL_0
@_LABEL_1:
return rd1
@_LABEL_6:

Constructing the Control-Flow Graph

To optimize our abstract bytecode, we need to reason about its flow of values. This is best achieved by representing the program as a control-flow graph (CFG). A CFG has one node per basic block of the bytecode that ends with a jump, return, or branch. Two nodes are connected by an edge (n1, n2) if n1 ends with a branch to n2; because our instruction set only has jumps and conditional branches, each node has an out-degree between 0 and 3.

CFG of myfun

The above graph shows the CFG of the program from our previous example. The “dom tree” and “postdom tree” capture dominance properties of the graph, which are important for later SSA construction. The CFG itself is easy to construct in two passes over the bytecode, but calculating dominance properties is trickier and requires a data-flow analysis on the CFG.

In Rust, we represent CFGs as follows, making them generic over their operations:

pub struct CFG<O> {
blocks: Vec<Block<O>>,
entry: usize,
exit: usize,
}
pub struct Block<O> {
pub label: Rc<str>,
pub body: Vec<O>,
pub preds: Vec<usize>,
pub children: Vec<usize>,
/// Idom of dominator tree.
/// None if entry node, else Some.
pub idom: Option<usize>,
pub idom_of: Vec<usize>,
pub r_idom: Option<usize>,
pub r_idom_of: Vec<usize>,
}

The function from_linear goes over the entire bytecode in an initial pass to pre-allocate blocks that it discovers. Afterward, a second pass copies each basic block’s code into the corresponding Block and adds edges when a branch is found:

pub fn from_linear(
code: impl AsRef<[Operator]>,
params: impl AsRef<[VReg]>,
max_reg: VReg,
) -> Self {
let code = code.as_ref();
// 'labels' helps us find the blocks of branch targets
let mut labels: HashMap<Rc<str>, usize> = HashMap::new();
let mut blocks = Vec::from([Block::empty(&Rc::from("ENTRY"))]);
let mut i = blocks.len();
// initial pass
for op in code.iter() {
if let Operator::Label(s) = op {
labels.insert(Rc::clone(s), i);
i += 1;
blocks.push(Block::empty(s));
}
}
blocks.push(Block::empty(&Rc::from("EXIT")));
let exit_idx = blocks.len() - 1;
let mut start = 0;
let mut block_idx = 0;
// second pass
for (i, op) in code.iter().enumerate() {
match op {
Operator::Label(_) => {
let block = &mut blocks[block_idx];
block.body = Vec::from(&code[start..i]);
if block.children.is_empty() {
block.children.push(exit_idx); //exit
blocks[exit_idx].preds.push(block_idx);
}
block_idx += 1;
start = i + 1;
}
Operator::J(s) => {
let block = &mut blocks[block_idx];
let target = *labels.get(s.as_ref()).unwrap();
if !block.children.contains(&target) {
block.children.push(target);
blocks[target].preds.push(block_idx);
}
}
Operator::Beq(_, _, t, f)
| Operator::Bl(_, _, t, f)
| Operator::Bgt(_, _, t, f) => {
for s in [t, f] {
let block = &mut blocks[block_idx];
let target = *labels.get(s.as_ref()).unwrap();
if !block.children.contains(&target) {
block.children.push(target);
blocks[target].preds.push(block_idx);
}
}
}
_ => {}
}
blocks[block_idx].body = Vec::from(&code[start..]);
}
// ...
}

Dominance

A node n1 dominates n2 iff every path from the CFG’s entry to n2 must pass through n1. The immediate dominator (idom) of a node is the unique closest node that it is dominated by. Because dominance is a transitive property, we can capture each node’s dominators in a “dominance tree” — such as the one from our example. In such a dom tree, every node is a child of its idom. Postdominance states the same properties as dominance, but on the reverse CFG. Node n1 postdominates n2 iff every path from n1 to the exit must pass through n1.

We calculate dominance sets by solving data-flow equations. Data-flow equations are recursively defined and can be solved by fixed-point algorithms.

In general, there are many possible solutions to fixed-point equations. For finding dominance properties, we want to find the maximal fixed-point, so that dominance is correctly captured in loops. We achieve this by initializing all of our sets to every node in the graph, which are then iteratively reduced according to the equation until we reach a fixed-point.

I won’t go over the entire code that calculates our dominator trees — if you are interested, you can check out the function calculate_idoms in ir.rs!

SSA Construction

The first thing we do to our CFG is transform it into static single assignment form (SSA). SSA is useful for the rest of our compiler because it captures data-flow information as part of the program’s namespace and thus simplifies reasoning about the program. To be in SSA, we need to change our bytecode to conform to the following attributes:

  • Each variable is defined at exactly one point in the program.
  • Each use of a variable is dominated by its definition (Dominance property)

This is achieved by inserting “phi functions” at merge points of the CFG. A phi function receives one variable per predecessor of the block it is in and — semantically — returns the value of the edge taken during execution.

SSA form is constructed using a two-phase approach. First, we generate phi functions at required program points. Second, we rename virtual registers so that every definition is unique.

Phase 1: Phi insertion

We overapproximate the set of nodes that require a phi function for a given definition by inserting phi functions in every block of its dominance frontier. The dominance frontier can be thought of as every block “b” that is “just outside” the range dominated by a block — i.e. a predecessor of b is dominated, but b isn’t. In the example-CFG for myfun, The dominance frontier of _Label_0 is _Label_5. The below snippet implements our phi insertions:

pub fn to_ssa(mut self) -> CFG<SSAOperator> {
let frontiers = self.get_dominance_frontiers();
let (globals, defined_in) = self.get_global_regs();
let mut phis = vec![vec![]; self.blocks.len()];
//phase 1: generate phi functions
for &global in globals.iter() {
let mut queue: Vec<_> = defined_in[&global].clone();
while let Some(next) = queue.pop() {
for &succ in &frontiers[next] {
let entry =
SSAOperator::Phi(global,
vec![global; self.blocks[succ].preds.len()]);
if !phis[succ].contains(&entry) {
phis[succ].push(entry);
queue.push(succ);
}
}
}
}
// ...

Phase 2: Register renaming

Once we have inserted all phi functions, we can rename registers to ensure every use refers to a single definition in the program. We perform renaming in a single pass over the entire program using a depth-first recursion on the dominator tree. We allocate a fresh name for every definition we encounter and rewrite every use with the last name allocated. After we processed all the instructions of a block, we need to rewrite the phi functions of the block’s successors by updating each edge’s operand’s original name to its rewritten name. Register renaming is performed by rename_blocks:

fn rename_blocks(
&mut self,
block: usize,
globals: &Vec<VReg>,
// names stores a stack of allocated names per old register name
names: &mut HashMap<VReg, Vec<VReg>>,
generator: &mut VRegGenerator,
// phis contains the phi functions from phase 1
phis: &mut Vec<Vec<SSAOperator>>,
) {
let current = block;
// we first push a new name for every global
for &global in globals.iter() {
let last = names.entry(global).or_default();
let next = last.last().cloned().unwrap_or(u32::MAX); // undefined
last.push(next);
}
// we then rewrite the **definition** of each phi function
for op in &mut phis[current] {
if let SSAOperator::Phi(rec, _) = op {
let old = *rec;
*(names.get_mut(&old).unwrap().last_mut().unwrap()) = generator.next_reg();
*rec = *names[&old].last().unwrap();
}
}
// ...
for op in &mut self.blocks[current].body {
match op {
Operator::Or(x, y, z) => {
// rewrite y, z with the active name
update_name!(y);
update_name!(z);
// allocate a new name for x
set_name!(x);
}
// ...
_ => {}
}
}
// we update the phi function's operands of the block's successors
for &child in self.blocks[current].children.iter() {
for phi in &mut phis[child] {
if let SSAOperator::Phi(_, args) = phi {
let selfpos = self.blocks[child]
.preds
.iter()
.position(|&v| v == current)
.unwrap();

update_name!(&mut args[selfpos]);
}
}
}
// we recurse on the dom tree
for idomsucc in self.blocks[current].idom_of.clone() {
self.rename_blocks(idomsucc, globals, names, generator, phis);
}
// we free all the names we allocated in the current subtree
for global in globals.iter() {
names.get_mut(global).unwrap().pop();
}
}

After both phases, our CFG gets transformed into the following! Notice the phi function’s in _Label_0 and _Label_5.

Summary

We extended our compiler’s front end by linear code generation, CFG construction, and SSA construction. In the next part, we will make direct use of SSA to implement an optimization that removes redundancies in our programs: GVN-PRE!

Stay tuned!

Further reading

The SSA Book

--

--