From human to binary: A basic guide through the compilation process in C.

Carlos Soria
4 min readFeb 4, 2022

--

When we write code in some programming language we are expecting the computer to do what we want, to perform a sequence of actions that will have a result. Most of the programming languages syntaxes are, kind of, human understandable, they use ‘words’ and some symbols, but, how do the computers actually perform our commands if they only understand binary?, most of us just took it for granted at some point. In this post I will overall explain what is going on behind the curtain.

What we need is something like a translator, and in computing that is known as a compiler, which is nothing more than a program whose input is in the source language(a high level language) and output in the target language (a lower level language like machine code), and finally converted into a executable.

While the compiling process can be understood as a whole it actually is made up of other ‘smaller programs’ (with their own input and output) or steps.

A little BIT of history

Concerning the c programming language the quintessencial compiler is the GCC which actually supports various progrraming languages but when just released in 1987 only supported C. This compiler whas devoleped by the GNU team (GCC stands for GNU Compiler Collection).

Preprocessing stage

This is the first step that our file goes through, basically the content of our code is expanded by removing the comments and including the content of all our header files, also any macro name is replace by code.

This is our source code, a traditional ‘Hello world’ program.

Now the following command will be applied:

The -E option makes the GCC to stop at the preprocessing stage and output the result, the -o is just to save the output with a specific file name.

We have two files.

If we open the output saved in main:

We can observe that all the information inside the header <stdio.h> has been included, but if we go to the very end of this file:

There it is, our main function and also note that the comments are gone.

Compiler stage

In this stage GCC converts our previous result into assembly code, with the following command we can output only the assembly code.

Which outputs a new file with a .s extension, if we open it:

We get the assembly code of our main.c file.

Assembler

Here our assembly code is converted into object code or machine language, by executing the following command and opening de main.o file we will be able to appreciate it.

Linker

The role of this stage is to create the executable file, also it merges several C files that are meant to conform a single program and to link any libraries used.

After this our program is ready for execution, we let the compiler run completely.

The standard output name is ‘a.out’ but we can set a custom name with the -o option, also note that it is a executable file.

Finally we get the computer to perform the task that we wanted, print a classic ‘Hello world’.

If we are a little curious and open a.out with a text editor we will realize that our binary code has changed a little BIT.

Conclusion

Now, we have a broad understanding of what happens in a fraction of a second bewteen we press ENTER and actually observing the result, the real process is a BIT more complex though, there more kind of ‘microprocesses’ like analazing the semantics of our code and classify them.

This post was meant for you to get a little impulse and do more research when you feel for it, there are more fascinating stuff to see over there in the ‘spirit’ of the machines…

--

--