GCC: What are the steps when you compile a file

Alexandre Dutertre
4 min readFeb 6, 2022

--

When you start coding in C, you are happy to see the result of what you wrote appear on the terminal. But you don’t necessarily think of what happens behind the scene, and that’s what I am going to talk about today.

First, GCC means GNU Compiler Collection. When we compile a “.c” file using gcc, there are four steps:

  1. Preprocessing;
  2. Compilation;
  3. Assembly;
  4. Linking.

Here is the code that will be used for the examples.

A simple code to print “Hello, World” followed by a new line

Preprocessing

While in the preprocessing stage, the compiler actually calls forth another program to preprocess the directives (#include and #define lines). That program copies the content of the header files (“.h” files) and paste it in the source code file. It also removes all the comments.

To stop after the preprocessing step, we use gcc -E file.c.

gcc: GNU project C and C++ compiler
-E: Stop after the preprocessing stage; do not run the compiler proper. The output is in the form of preprocessed source code, which is sent to the standard output.

The result gives a “.i” file. Here is what the file’s content looks like:

The beginning of the file with the content of stdio.h
The end of the file with my written code

As we can see, there is now a void where there used to be comments.

Compilation

During the compilation step, the compiler converts the preprocessed output file to assembly language. It’s an intermediate language that human can read and which is specific to the processor of the target.

To stop after the compilation, we use gcc -S file.c.

gcc: GNU project C and C++ compiler
-S: Stop after the stage of compilation proper; do not assemble. The output is in the form of an assembler code file for each non-assembler input file specified.

This step generates a “.s” file.

Part of the assembly code

Assembly

At that point, the compiler got a file that is still human readable and needs to make it in machine language (i.e. binary code). That’s what this step is for. The code of this step is also known as object code.

To stop after that step, we use gcc -c file.c. It generates a “.o” file.

gcc: GNU project C and C++ compiler
-c: Compile or assemble the source files, but do not link. The linking stage simply is not done. The ultimate output is in the form of an object file for each source file.

When we take a look at the file generated, we can see that it’s harder to read but that we can still make out what we want to print on the terminal.

The content of the object file

If we open our “main.o” file with a software like “Sublime Text” for example, we can see that the content of our file is actually in hexadecimal.

The content of the object file in hexadecimal

Linking

The final step. In this one, the compiler takes the code of all the object files (if we divided our code in several “.c” files) to merge it into a single program by linking the files between them.

If we used functions stored inside libraries, it will link it to our file so that the program knows where to look at to find said function. There are two types of libraries:

  • Static libraries (.lib, .a): its code is put inside the binary file, which increase its size;
  • Dynamic libraries (.dll, .so): the name of the library is put inside the binary file and is loaded when first called.

By default, gcc use dynamic libraries.

To compile our “.c” file, we either do gcc file.c, the executable output is then called “a.out” by default or we can give a name to our executable by using gcc -o exe_name file.c.

gcc: GNU project C and C++ compiler
-o: Place output in file file. This applies regardless to whatever sort of output is being produced, whether it be an executable file, an object file, an assembler file or preprocessed C code.
The different options to make the executable

It is possible to keep a file for each steps by doing gcc file.c -save-temps.

gcc: GNU project C and C++ compiler
-save-temps: Store the usual "temporary" intermediate files permanently; place them in the current directory and name them based on the source file
A file for each steps

Conclusion

  • Preprocessing by removing comments and putting header files in source code;
  • Compiling by converting the code into assembly language;
  • Assembling by converting the assembly language into binary code;
  • Linking all the files together with libraries into an executable.

--

--

Alexandre Dutertre

Student of Holberton School in Laval, France (Cohort #17)