What happens when you type gcc main.c?

Before we answer that question, it’s important to distinguish between two types of programming languages: interpreted languages and compiled languages. Interpreted languages are languages that are executed in source code form by an interpreter. JavaScript, Perl, and PHP are some examples of interpreted languages. Compiled languages, on the other hand, need to first be compiled. This produces a program written in assembly language. C is an example of a compiled language.

GCC, the Gnu Compiler Collection, is a 100% free collection of compilers for C as well as other programming languages. C programs are compiled in four main steps: preprocessing, compiling, assembling, and linking. We’ll go through each of the four steps for a classic program, hello world. Here is our main.c file, containing the C code to print “Hello, World!”

Hello, World!

The first step is preprocessing. In this step, lines that start with # are interpreted as preprocessor commands. Some common examples are #include and #define. In the case of our hello world program, the #include <stdio.h> line is replaced with declarations for the header functions contained in stdio.h. To view the output of the preprocessing stage alone, type gcc -E main.c:

The command for the preprocessing stage

Another thing that happens during the preprocessing stage is that comments are removed. You can see from the last few lines of the results here that our comment section at the top of the file has been removed, along with the tabs used for indenting:

The results of the preprocessing stage

The second stage is called compiling. In this stage, the C source code is turned into an object code file, which contains the binary version of the source code. The name of this file matches the name of your .c file, but with the .o extension. This means that compiling our main.c program produces the main.o object file. To complete the compiling stage, you can type gcc -c main.c:

The command for the compiling stage

This results in a binary object code file, which is not human readable. The first few lines of our main.o file are as follows:

The contents of main.o

The third stage is called assembling. In this stage, your preprocessed code is turned into assembly language instructions. This is an intermediate human readable language that is closer to machine language. This stage can be called with the command gcc -S main.c:

The command for the assembling stage

This command produces main.s, which contains the assembly language instructions:

The contents of main.s

The fourth and final stage is called linking. In this stage, the compiler links together the object files into a binary executable. For our hello world program, the linker will add the object code for the printf function. If you run gcc without specifying an output file, it will create one for you called a.out.

Running gcc main.c produces the executable file a.out

To run your program, you need to type ./a.out:

The program works and displays the intended message!

To summarize, the four steps involved in compilation are preprocessing, compiling, assembling, and linking. Fortunately, the compiler handles all of these steps for you. I hope that after reading this, you have a better understanding of what is going on under the hood.

If this article helped you, please follow me for updates. Thanks for reading!

Software Engineering Student/TA at Holberton School

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store