Compiling Computer Programs — An Explanation of the Compilation process

When writing code in low-level programming languages (eg C, C++, Objective-C) the computer can’t run what you write because your computer only recognizes machine code which are represented in 1’s and 0’s so how do programmers who write code in C get the computer to understand their code? They use what is called a compiler. A compiler is a program that reads your code and takes it through a few steps to turn your code into machine code. The compiler runs your code through 4 processes; preprocessing, compiling, assembling, and linking. As we walk through what each process does to understand how we get from human readable C code to code that the computer can understand and run we will use gcc a free compiler created by the GNU project.

GNU Compiler Collection or gcc for short is a free and open source compiler for compiling languages such as C, C++, Objective-C, Fortran, Ada, and Go. In order to better understand how a compiler works we will use gcc to run through all the steps of the compilation process. When you first start the compilation process the compiler needs to get your code ready to turn into machine readable code. In order to do this it preprocesses the source code that the programmer wrote and includes the code of anything marked by a # which usually marks a header file. The contents of the header file is then merged with the contents of the file created by the programmer (for the purpose of this exercise we will use a file named main.c) and turns any macros, inline files, and will then conditionally omit code. To show just the results of the preprocessing portion of compiling you would run the following command:

cc -E main.c

The second step of the compilation, oddly enough, is called compilation. In this step the compiler takes the preprocessed code and turns it into assembly code. Assembly code is an intermittent step between the human readable C code in our main.c file and the eventual machine code that the computer will use to run the program. Assembly code turns the preprocessed code and adds inline code to what was in the C code as well as making the assembly code targeted towards the intended processor architecture. In order to save the assembly code in your directories you would type the following command in your command line:

cc -S main.c

This creates a file named main.s which can be viewed and is human readable.

In the next step the compiler turns the assembly code into machine code in what is called assembly. This turns the assembly code from words and numbers to 1’s and 0’s which the computer can read and act on. The result of this step can be saved in your directory by running the command:

cc -c main.c

The file created from this step is now called main.o you can view it using either the hexdump or od command in your command line.

The final step in the compilation process is called linking. In this step the machine code created during assembly is now cleaned up to create an executable file. Because the assembly process either leaves out or puts code out of order the linking process cleans up the code to make it machine readable and executable and then creates a different file for the program. The command to fully compile the program is:

cc main.c

However, if we run the compiler without specifying what we want to call the file it creates a file named a.out by default. In order to create a file name that is more descriptive or will help you to remember what the program does you would run the command:

cc main.c -o main

This command would compile the code and then name the executable file main.

When creating a program in a low-level language like C we need to compile the results so the computer can run the code properly so we need to run our file through a compiler like gcc. The compiler takes 4 steps to get your C code to a computer readable executable file; preprocessing, compilation, assembly, and linking. At the end of the process you will have your raw source file which can be edited easily as well as a new executable file to run your program. We are fortunate to live in a time where programmers don’t have to make their code machine readable from scratch and we have compilers to do the hard work for us, however it is important to understand how they work so we know how to better optimize our code. Happy coding, and remember #cisfun.