How Does C Compilation Work?

Brennan D Baraban
5 min readSep 13, 2018

--

I wish I could tell you why in the world the GCC logo is a full-grown bull hatching out of an egg. By https://gcc.gnu.org/img/gccegg.svg [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-sa/3.0/)], via Wikimedia Commons.

For those of us practiced in writing C programs, gcc is likely a familiar command. A quick gcc command will turn a C file (specified with the ending .c) into an executable that can be used to run a desired program. Yet, what is happening behind the scenes, between those steps when gcc takes a file and magically turns it into an executable program like a Gob Bluth trick?

Source: https://giphy.com/gifs/arrested-development-gob-bluth-excited-n0WvhHFTpihk4

The C programming language is what is referred to as a compiled language. In other words, C programs are implemented by compilers, which translate source code into machine-readable code (more on that later).

The GNU Compiler Collection (GCC) is one such compiler for the C language. That’s right, that confounding, incredible, majestic, full-grown egg-bull represents a powerful program that can turn raw C code into easy-to-use, executable files. In fact, C is only one among several languages that GCC supports, others including C++, Fortran, and Go. Today, GCC is the standard compiler for many Unix-like systems, including Linux.

Compilers translate source code through a four-step process — preprocessing, compiling, assembly and linking. Let’s break down each step in turn. As we go, we’ll use the example program main.c, which we see is a simple C program that prints a message defined in a macro variable:

1. Preprocessor

The preprocessor initially reads through source code and prepares it for compilation through three tasks. First, the preprocessor removes all comments from the code, those lines specified in C by /* */ or //. Second, the preprocessor will include any header files linked at the beginning of C files through the syntax #include "example_header.h". Finally, any and all macro variables defined in the file are replaced by their specified values.

We can view the results of the preprocessor by running gcc with the option -E, which suppresses the compilation process after this first step:

Here, we see that the preprocessor stripped main.c of its comments while replacing the macro variable MESSAGE with its defined value.

2. Compiler

After preprocessing, the newly-filtered file is passed to the compiler. The compiler takes the preprocessed file and uses it to generate corresponding assembly code. Assembly code, or assembly language (often abbreviated asm), is a high-level programming language that corresponds programming code with the given architecture’s machine code instructions. Feel free to learn more about the details of assembly code here.

To halt the compiling process at this second step, we can run our example file main.c with the gcc option -c. Then, we can view the resulting, mostly unreadable mess of assembly code:

3. Assembler

From the compiler, the new assembly code is passed to the assembler. The assembler does just what its name suggests — it assembles the code into object code. Where assembly code represents a correspondence between program and machine code, object code represents pure machine code (ie. binary). You can find more information on object code here.

Our main.c file’s assembly code wasn’t unreadable enough for me. Let’s run gcc with the -S option to halt the compiler at the assembly step:

Well, at least the formatting is prettier?

4. Linker

Preprocessed, compiled, and assembled, the now-object code is finally ready to be converted into an executable. To do so, the compiler takes one last step and sends the code to the linker, which takes all object codes and libraries passed to it and links them together into a single executable file.

Within the scope of our example, main.c is being compiled without any additional links or files, so the program will be converted into an executable by itself. Alas, for the grand finale, let’s run the full shebang (no computing reference intended here), gcc main.c without any options, to preprocess, compile, assemble, and link the program all at once.

…what happened?

Without specifying an executable name (which can be done using the -o option), GCC defaults to storing the successfully compiled executable in a file called a.out. To run our program, we’ll use the command ./a.out, which runs the executable in the current working directory:

And voilà! [Insert terrified Shelley Duvall screaming here.]

--

--