Dissecting GCC

Afa Madza
3 min readJan 18, 2018

--

The GNU Compiler Collection (GCC) is a compiler system developed by the GNU Project which supports a variety of programming languages. GCC is the standard compiler for most Unix-like operating systems and is also a key component of the GNU toolchain. In this article, I’ll explain what exactly happens at each step of the compilation process. In broad strokes, the sequence of events after a compilation command is given to GCC includes:

  • Preprocessing
  • Compilation
  • Assembly
  • Linking

Hello World!

As an example, we will examine each stage of compilation using the well-loved “Hello World” program main.c . The code for main.c is as follows:

#include <stdio.h>int main (void)
{
printf ("Hello, world!\n");
return 0;
}

It is important to note that GCC executes each sequence of commands automatically without advertising what exactly is going on behind the scenes. If however, you would like to see each individual step being executed, gcc -v main.c will run a verbose compilation that displays detailed information about the commands being executed. The first line of the program, #include<stdio.h>, is a header file that defines input/output routines used by our program. The next line of code, int main (void), is the “entry point” of our program. Following that is printf which prints the characters between the quotation marks to “standard output.” The \n character is not printed because gcc interprets it as a “newline.” Finally, the program returns an integer, 0, on the last line of code since we defined the main routine as returning an integer (int ).

Preprocessor

When we write a C program, the first stage of the compilation process is to expand macros and included header files via the preprocessor. To accomplish this, GCC executes the following command:

$ cpp main.c > main.i

The resulting file, main.i , contains the source code with expanded macros. It is C programming convention to label preprocessed files with the '.i' extension.

Compiler

This stage is actually where the compilation of preprocessed source code to assembly language for a particular processor takes place. In the command line to convert the preprocessed C source code to assembly language, enter the following code and hit enter:

$ gcc -Wall -S main.i

Note that the running gcc with the -S option will not create an object file. The resulting file is main.s. To see the assembly language stored in the file, enter the following into the command line:

$ cat main.s

Assembler

As its name suggests, the assembler converts assembly language to machine code and generates an object file. When calls to external functions are made in the assembly source file, the assembler leaves the addresses of the external functions undefined so that the linker can fill them in during the next step of the process. To generate the object file with the assembler, enter and run this command:

$ as main.s -o main.o

The -o option above specifies that the output should be directed to the main.o file. The main.o file contains instructions in language that the machine can understand in order to run our program. As I mentioned earlier, any calls to external functions would be undefined at this point.

Linker

In this stage of the compilation process, the Linker proceeds to link object files to create an executable. Creating an executable usually necessitates calls to external functions from C run-time libraries and system. For instance the full command to link our program is:

$ ld -dynamic-linker /lib/ld-linux.so.2 /usr/lib/crt1.o 
/usr/lib/crti.o /usr/lib/gcc-lib/i686/3.3.1/crtbegin.o
-L/usr/lib/gcc-lib/i686/3.3.1 main.o -lgcc -lgcc_eh
-lc -lgcc -lgcc_eh /usr/lib/gcc-lib/i686/3.3.1/crtend.o
/usr/lib/crtn.o

Thankfully, we would never directly need to type that behemoth of a command because the linking process is discreetly handled by gcc when the following code is entered:

$ gcc main.o

This command links the object file main.o to the C standard library, after which an executable file called a.out is produced. To run the Hello World program, simply enter:

$ ./a.out

The output should look like this if done correctly:

$ ./a.out
Hello, world!

Now that we understand how much is involved at each stage of the compilation process, we can truly appreciate the simplicity of typing gcc main.c into a terminal and letting GCC do the heavy lifting.

Source: http://www.network-theory.co.uk/docs/gccintro/gccintro_87.html

--

--

Afa Madza

West Point grad turned aspiring Software Engineer. USMA ’16 | Holberton School Batch 5