ℹ️ This article is based on Go 1.13.
The Go compiler is an important tool in the Go ecosystem since it is one of the essential steps for building our programs to executable binaries. The journey of the compiler is a long one, it has been written in C to move to Go and many optimizations and cleanups will keep happening in the future. Let’s discover the high level of its operations.
The Go compiler is composed of four phases that could be grouped into two categories:
- frontend. This phase runs an analysis from the source code and produces an abstract syntactic structure of source code, called AST.
- backend. The second phase will transform the representation of the source code into machine code, along with several optimizations.
In order to better understand each phase, let’s use a simple program:
The first phase is pretty straightforward and well explained in the documentation:
In the first phase of compilation, source code is tokenized (lexical analysis), parsed (syntax analysis), and a syntax tree is constructed for each source file.
The lexer will be the first package to run in order to tokenize the source code. Here is the output of the previous example tokenized:
Once tokenized, that will be parsed and used to build a syntax tree.
The transformation to an Abstract Syntax Tree can be displayed thanks to the command
go tool compile with the flag
This phase will also include optimizations like inlining. In our example, the method
add can be inlined already since we do not see any instruction
CALLFUNC to the method
add. Let’s run the again command with the flag -l that disables the inlining:
Once the AST generates, it allows the compiler to go to a lower-level intermediate representation with the SSA representation.
The Static Single Assignment form is the phase where the optimizations will happen: dead code elimination, removal of unused branches, replacing some expressions with constant values, etc.
The SSA code can be dumped thanks to the command
GOSSAFUNC=main go tool compile main.go && open ssa.html that produces an HTML document will all the different passes that are done in the SSA package:
The generated SSA stands in the “start” tab:
b are highlighted here, along with the
if condition and will allow us later to see how those lines are changed. The code also shows us how the compiler manages the
println function that is decomposed in 4 steps:
printunlock. The compiler automatically adds a lock for us and, according to the type of the argument, will call the related method to print it correctly.
In our example, since
b are known at the compilation, the compiler can calculate the final result and mark the variables as not necessary anymore. The pass
opt will optimize this part:
v11 has been replaced here by the result of the addition of
v5 that have been marked as dead code. The pass
opt deadcode will then remove that code:
if condition, the
opt phase will mark the constant
true as dead code and then will be removed:
Then, another pass will simplify the control flow by marking the unnecessary block and condition as invalid. Those blocks will later be removed by another pass dedicated to the dead code:
Once all the passes are done, the Go compiler will now generate an intermediate assembly code:
The next phase will generate the machine code into the binary file.
Machine code generation
The last step of the compiler is the generation of the object file,
main.o in our example. From this file, it is now possible to disassemble it with the
objdumptool that does the reverse process. Here is a nice diagram created by Grant Seltzer Richman:
You can find more information about the object file and binaries in “Dissecting Go Binaries.
Once the object file is generated, it can now be passed directly to the linker with the command
go tool link and your binary will finally be ready.