Looking inside the GCC compiler

Luis Fernando Manrique Chavez
4 min readFeb 8, 2022

--

Why do we compile our code?

The c compilation process is the way we get our source code turned into machine code that the computer understands as input. As we know, the only way to communicate with a computer is through the binary system (zeros and ones) that represent the two states of electricity on (1) and off (0).

To get to the binary system that our computer understands, the gcc compiler is used, which comes from the acronym GNU Compiler Collection, is an optimizing compiler produced by the GNU Project supporting various programming languages, hardware architectures and operating systems

How does the gcc compiler convert my code to machine code?

To get to the machine code, our code goes through 4 steps, which will be explained briefly and then an example of how our code is transformed in each step of the compilation to become totally a code that only computers understand.

What are the compilation steps?

The compilation steps are 4:

  • Preprocessor
  • Compiler
  • Assembler
  • Linker
Compilation steps in C
Compilations Steps — Own Source

Now it will be detailed what each step of the compilation consists of:

1. Preprocessor

The first thing in the c compilation process does is go through a preprocessor. The source code is the one that is written in a text editor, and this file has the extension “.c”.

In the preprocessor step, many steps are performed, among which the most important are taking the source code as input, removing the comments that have been placed in the code.

Expand found macros and expand included files. For example, if the directive <stdio.h> is available in the program, then the preprocessor interprets the directive and replaces it with the content of ‘stdio.h’.

What we get after going through the preprocessor is an expanded code, to make the process stop after the preprocessor step we use the gcc -E main.c command.

Taking into account that main.c is the name of the file that we are compiling, and its content is as follows:

Sample File — Own Source

We can, with the flag -o, assign a name that we want to give to the result of applying said command, in this case we put the name c. We can see the result that is generated after the first step, which is preprocessing, and how the source code changes to what we see in the following image:

$gcc -E main.c -o c
Preprocessing result — Own Source

2. Compiler

In this step, the code that is expanded by the preprocessor is converted by the compiler to assembly code. In the same way to visualize the change in our source code with the -S command, we can make the compilation process stop after going through the compiler and before reaching the assembler.

If you remember previously we put a flag -o to generate that the result that is generated after the applied command has a name that we assign but if we do not do it by default in this step a file with the same name of the source code file is generated but with a .s extension, as we can see below:

$gcc -S main.c
Compiler result — Own Source

3. Assembler

In this step, the assembly code is converted to object code by the assembler. To be able to see the result of our source code after going through the assembler we use the -c command, and this by default generates a main.o file, which when shown with the cat command, we can see its content as shown in the following image.

$gcc main.c -c
Assembler result — Own Source

4. Linker

In this final step, the function calls are linked to their definitions. Also, something important to mention is that linker adds additional code to our program that is required when the program starts and ends.

To verify this last step we can use the command $size main.o and $size main, the latter is the final result of the entire compilation process in which it becomes an executable file.

So in this way with these last commands we can know how the output file increases from an object file or object code to an executable file, and this is due to the extra code that the linker adds to our program and this can be seen in the text column in which it is seen that increases considerably from one step to another.

Showing result after going through the linker

In this way, we have been able to see the step-by-step process that our source code goes through when we do a compilation process in C.

Thanks to this, we can communicate with our computer without having to write all our code in binary, which would be really an awful idea.

Thanks for the attention.

--

--