What happens when you type “gcc main.c”?

Wescott Sharples
5 min readSep 12, 2018

--

In this blog post, I’m going to break down the compilation process of the C programming language to the best of my ability.

The C compilation process has four main stages: preprocessing, compilation, assembly, and linking. These stages happen individually from one another, and take place in that exact order.

What is “gcc main.c”?

When we use “main.c” in the context of “gcc main.c” it is the name of the file which we want to be compiled. A “.c” file is a file written in the C programming language. The C programming language cannot be understood by our computer. This is why we need to compile in the first place. Compilation is what takes us from a “high-level” programming language like C to low-level machine code that the computer will understand.

When using “gcc” in this context, it is a default command on Linux and Unix-like operating systems that when invoked will do preprocessing, compilation, assembly, and linking on whichever file we specify. There are gcc options that allow us to stop the compilation process at any of the four stages which illustrates the fact that the stages operate in isolation of one another.

The Preprocessor

Our main.c file is first processed by the preprocessor. The preprocessor generates an intermediate file that is then passed on to our compiler. The preprocessor will firstly remove comments from our code. Next, the preprocessor will include header file code in our main.c file. Finally, the preprocessor will expand all macros.

Take this C test file (creatively named “testfile.c”) as an example.

This file simply prints “Hello, Holberton.” with a new line. You can see that I have a multi-line comment declared using “/*” before what I want to comment out and “*/” after. Additionally, I have the “#include <stdio.h>” at the top of my file. This is known as a header. In my header, I am including the standard input/output library which contains a bunch or ready-to-go functions such as “printf()” which we use later in the code.

Once, I run testfile.c through the preprocessor it will get rid of my multi-line comment that appears before my “main” function. It will also include the header file code from the standard input/output library.

To run testfile.c through the preprocessor alone — and not the compiler, assembler, or linker — we use the option “-E” with our command gcc. We are going to also run the gcc command with “-o” which is a flag that names the output file of gcc.

Now you can see that testfile has been run through the preprocessor and will no longer contain our multi-line comment. Our file also now contains a ton of code from the standard input/output library before our main function. A piece of it is pictured below.

The Compiler

Once preprocessed, our main.c file is converted from C syntax to assembly code. Eventually, we want to get main.c from C to binary, but we need to take the intermediate step of assembly code before we can convert to binary. Below, we can see that testfile has been converted from C to assembly code using the “-S” flag for our gcc command.

See that stuff that looks like gibberish after we look into testfile. That is assembly. I can’t believe people program in that language.

The Assembler

The assembler converts the assembly code into binary, (aka. machine code or object code). Binary is comprised entirely of 1’s and 0’s. We are going to run the gcc command with “-c” to compile it without going into the linking phase.

Our shell warns me when I try to see into this file because it knows that I’m too much of a rookie to understand what it means. I’m going to go ahead and respond “y” for yes.

Less, our terminal pager program, is trying to render the binary as ASCII characters. This is why we see a bunch of @ and ^ signs surrounding “Hello, Holberton.” Here I was thinking assembly code was hard to read.

The Linker

The linker links numerous object code files into a single executable file. That way, if you are working on a program with other people, you can write separate files, but in the end have one central executable file with all of your code combined. The linker also links code from any libraries we are using.

There are two types of linking: dynamic and static. Static linking is done at “compile time” by the linker. Dynamic linking is done when you run a file via the operating system. Since this is the final stage that the compiler takes, we can just use the standard gcc command without any options.

That’s it. Since we are on Ubuntu, the compiler’s output is by default a file called “a.out” which is executable. The executable file’s name can vary based on your operating system.

Let’s run our program.

Glorious.

--

--