Compilation steps

Published in

Nerd For Tech

6 min readJun 21, 2021

Here you will understand where those extra lines come from.

The compilation is the process to translate a high human-readable programming language into a language that can be understood by the computer: ones and zeros, that is saved in an executable file. The program that makes that process is called ‘compiler’.

The compiler GCC can translate languages like C, C++, Objective C, and Objective C++. Here we talk only about C compilation.

Basic usage

To use GCC you need to write the name of the program and the name of the program in the command line.

$gcc -E sourcecode.c

If the code doesn’t have problems, GCC is going to create an executable file with the predefined name “a.out”, that you can execute like this:

$./a.out

Name your file

To give it a personalized name you can use the flag -o at the end of the sentence like this:

$gcc -E sourcecode.c -o newname

The flag -o is the only one that is written at the end of the command line.

Flags to review the code

You could wish to add some flags to be sure that your code not only works but is also well written. Some of them are:

Wall enables all compiler’s warning messages.
Werror shows all the warnings as errors.
Wextra let you check for some optional characteristic of the code (not included in the flag -Wall).

Steps of compilation

Now, we are going to describe the compilation process step by step:

1. Preprocessor

The first thing that GCC makes is run the PRE-PROCESSOR. It takes the source code, removes the comments, includes headers, and replaces macros with code ( keep reading to understand these). You can ask the preprocessor to stop at the end of this stage with the flag “-E”.

$gcc -E file.c

The comments let the people understand what your functions need as input (parameters), what the function does with these parameters, and what it returns in case of success or failure. It means that when someone else needs to use that function can read those comments and be ready to use it. They look like this:

/**
 * function_name - description of the process made by the function
 * @parameter: The description of the parameter
 * Return: Description of what is printed in the standard error
 * if the functions achive its goal, or if it fails
 */

Comments can be inside the function, too. They can inform what the code does, or they can be used to avoid some lines in the compilation.

c = (a < b) ? a : b; /* a comment here can explain this line */

A header file tells the compiler how to call some functionality, It means, which is the name of the function, which are the inputs needed and the output it generates. But headers don’t include the process made in the middle of the inputs and the outputs. Think that the functions are your friends and the header is like an agenda where you save the numbers of your friends, so you can call them when you want to. The header is included in the same subdirectory as the source code, and on the inside, the header files are human-readable, with the same syntax as the source code.

int _putchar(char c);

This is our new friend the function ‘_putchar’. To call her we need to give a char, and it is going to return an integer (a number). The body of the function is going to be in the Libraries added in the linking step.

A macro is a fragment of code that is given a name. They can define constants that are going to be used during the process but they can’t be changed during the process implementation, like in the first example the constant PI. They can define some kind of special function, too. In the second example is defined the process to get the area of a circle of diameter r.

#define PI 3.14
#define circleArea(r) (3.1415*(r)*(r))

2. Compiling

The output of the preprocessor is received and transformed in assembly code.
A human-readable language, a little bit harder than C. Let's see an example of the same function written in C, and written in assembly:

int main(){
return (1);
}

The assembly code gives more specific and detailed instructions than a more human-readable language.

cat example.s
.file   "example.c"
        .text
        .globl  main
        .type   main, @function
main:
.LFB0:
        .cfi_startproc
        pushq   %rbp
        .cfi_def_cfa_offset 16
        .cfi_offset 6, -16
        movq    %rsp, %rbp
        .cfi_def_cfa_register 6
        movl    $1, %eax
        popq    %rbp
        .cfi_def_cfa 7, 8
        ret
        .cfi_endproc
.LFE0:
        .size   main, .-main
        .ident  "GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0"
        .section        .note.GNU-stack,"",@progbits

To see your own functions translated to assembly code, use the flag -S.

$gcc -S filename.c

3. Assembly

The assembler translates the assembly code into binary.

$gcc -c filename.c

The previous example in this stage generates a file with the extension ‘.o’ named ‘object code’. It is a binary file, but an editor as emacs, vim, or nano, will not show the ones and zeros, they will will show something like this:

$ cat example.o
ELF>@@UH��]�GCC: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0zRx
F                                                       A�C
 ��
                 
                   example.cmain .symtab.strtab.shstrtab.text.data.bss.comment.note.GNU-                                                                                       -stack.rela.eh_frame

If you want to see the file in zeros and ones, you can use the command xxd:

xxd  -b example.o

It is a little bit long: 204 lines, just to return the number 1.

4. Linking

Finally, the linker adds the environment variables, function definitions, and settings required to run the code, creating a very big executable file. Keep reading to see what these mean.

To complete the linking phase, you can compile without any flag:

$gcc filename.c

By default, it will create a new executable file called a.out. Execute it like this:

$ ./a.out

The linker will look for the environmental variables mentioned in your code, for example:

$HOME

And will bring the content of the variable, in my case:

/home/vagrant

You can see all your environment variables with the command printenv:

$ printenv
LANG=en_US.UTF-8
HOME=/home/vagrant      #This is the home directory
XDG_SESSION_ID=13
USER=vagrant
SHELL=/bin/bash
LANGUAGE=en_US:
PATH=/home/vagrant/.local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
_=/usr/bin/printenv
OLDPWD=/home/vagrant/holberton-system_engineering-devops

and see the content of one of these with echo:

$ echo $HOME
/home/vagrant

All these variables can be taken into account inside your functions.

About function definitions, the compiler already knows that we called the library stdio.h, and we used the function putchar. But it only knows the prototype that we included in the header. The linker now searches the body of the function putchar and includes it inside our program.

About me

I am a passionate software developer from Holberton School and a Psychologist from the National University. During all my life I have been developing valuable professional skills as being a good listener, critical thinker, and team player. I have been consistently recognized as a very intelligent and empathic person. Whether on work or academic life I want to create meaningful experiences and inspire my partners. I am consistently dedicated, and curious.

If you want to create a connection with me, follow me on GitHub or Twitter.

I hope you enjoyed this reading!

Made by Natalia Vera Duran.