Understanding compilation

Carlos Barros
3 min readFeb 5, 2020

--

The creation of an executable C module from a source code is basically a three-stage process (see figure 1). In the first step, the processor prepares and modifies the source code for the compilation phase according to instructions specified in the pre-processing directives. One of its most important tasks is to copy the entire contents of all the headers, starting with the # (#include) symbol in each .c file.

Fig 1. Resource: Beginning C++17, Ivor Horton and Peter Van Weert

It is important to note that in preprocessing, operations occur before the program is compiled. Preprocessing modifies the declarations that make up the program, and the preprocessing directives no longer exist in the source file being compiled.

Let’s see it below with an example. As we can see a previously created file called ‘hello.c’ in the Linux terminal we write the following script ‘gcc hello.c -save-temps -or hello.exe’ to generate all .s .c and .o files:

Script ‘gcc hello.c -save-temps -or hello.exe’

In this image we can see that they have been created thanks to the previous script:

List of created files

In the second step, the compiler processes each .c file by translating the high-level instructions in a .i file into low-level assembly language instructions. The generated file containing the translation is in text format and typically has a .s file extension

Content hello.i file

In the hello.s file it shows the content of the assembler:

Content hello.s file

Then, in the third step, the linker combines one or more binary objects, that is, the generated file is in binary and usually has a .exe file extension which contains the complete executable program.

Content hello.o

To further understand the ‘compilation’ let us understand it as the one that describes the first two stages in which they operate in a single source code text file thus generating a binary object file. In this process, the programmer will know if his source code contains syntax errors, if a parenthesis or semicolon is missing, this will be immediately reported by the compiler and the compilation will fail.

The linker, on the other hand, can operate multiple objects so that it eventually generates a single executable file. Thanks to this, large programs can be created from files of modular objects that each can contain within them functions that can be reused.

REFERENCES

Ivor Horton and Peter Van Weert, Beginning C++17

Verified Compilation for Shared-Memory C

Dynamic Compilation of C++ Template Code

--

--

Carlos Barros

Junior Data Scientist | Python | R programming | Tableau | AWS | Data Analytics | Data Visualization | Statistics